CC BY-NC-ND 4.0 · Yearb Med Inform 2019; 28(01): 128-134
DOI: 10.1055/s-0039-1677903
Section 5: Decision Support
Working Group Contribution
Georg Thieme Verlag KG Stuttgart

Artificial Intelligence in Clinical Decision Support: Challenges for Evaluating AI and Practical Implications

A Position Paper from the IMIA Technology Assessment & Quality Development in Health Informatics Working Group and the EFMI Working Group for Assessment of Health Information Systems
Farah Magrabi
1   Macquarie University, Australian Institute of Health Innovation, Sydney, Australia
Elske Ammenwerth
2   UMIT, University for Health Sciences, Medical Informatics and Technology, Institute of Medical Informatics, Hall in Tyrol, Austria
Jytte Brender McNair
3   Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
Nicolet F. De Keizer
4   Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health research institute, The Netherlands
Hannele Hyppönen
5   National Institute for Health and Welfare, Information Department, Helsinki, Finland
Pirkko Nykänen
6   Tampere University, Faculty for Information Technology and Communication Sciences, Tampere, Finland
Michael Rigby
7   Keele University, School of Social Science and Public Policy, Keele, United Kingdom
Philip J. Scott
8   University of Portsmouth, Centre for Healthcare Modelling and Informatics, Portsmouth, United Kingdom
Tuulikki Vehko
5   National Institute for Health and Welfare, Information Department, Helsinki, Finland
Zoie Shui-Yee Wong
9   St. Luke’s International University, Tokyo, Japan
Andrew Georgiou
1   Macquarie University, Australian Institute of Health Innovation, Sydney, Australia
› Author Affiliations
Further Information

Publication History

Publication Date:
25 April 2019 (online)


Objectives: This paper draws attention to: i) key considerations for evaluating artificial intelligence (AI) enabled clinical decision support; and ii) challenges and practical implications of AI design, development, selection, use, and ongoing surveillance.

Method: A narrative review of existing research and evaluation approaches along with expert perspectives drawn from the International Medical Informatics Association (IMIA) Working Group on Technology Assessment and Quality Development in Health Informatics and the European Federation for Medical Informatics (EFMI) Working Group for Assessment of Health Information Systems.

Results: There is a rich history and tradition of evaluating AI in healthcare. While evaluators can learn from past efforts, and build on best practice evaluation frameworks and methodologies, questions remain about how to evaluate the safety and effectiveness of AI that dynamically harness vast amounts of genomic, biomarker, phenotype, electronic record, and care delivery data from across health systems. This paper first provides a historical perspective about the evaluation of AI in healthcare. It then examines key challenges of evaluating AI-enabled clinical decision support during design, development, selection, use, and ongoing surveillance. Practical aspects of evaluating AI in healthcare, including approaches to evaluation and indicators to monitor AI are also discussed.

Conclusion: Commitment to rigorous initial and ongoing evaluation will be critical to ensuring the safe and effective integration of AI in complex sociotechnical settings. Specific enhancements that are required for the new generation of AI-enabled clinical decision support will emerge through practical application.

  • References

  • 1 Coiera E. The fate of medicine in the time of AI. Lancet 2018; Dec 1 392 (10162) 2331-2
  • 2 Yu KH, Kohane IS. Framing the challenges of artificial intelligence in medicine. BMJ Qual Saf 2019; 28: 238-41
  • 3 Yu K-H, Beam AL, Kohane I.S.. Artificial intelligence in healthcare. Nat Biomed Eng 2018; 2 (10) 719-31
  • 4 Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28: 231-7
  • 5 Kanagasingam Y, Xiao D, Vignarajan J, Preetham A, Tay-Kearney M-L, Mehrotra A. Evaluation of Artificial Intelligence–Based Grading of Diabetic Retinopathy in Primary CareArtificial Intelligence–Based Grading of Diabetic Retinopathy in Primary Care Artificial Intelligence–Based Grading of Diabetic Retinopathy in Primary Care. JAMA Netw Open 2018; 1 (05) e182665-e
  • 6 Kim MO, Coiera E, Magrabi F. Problems with health information technology and their effects on care delivery and patient outcomes: a systematic review. J Am Med Inform Assoc 2017; 24 (02) 246-50
  • 7 Shortliffe EH. The adolescence of AI in medicine: will the field come of age in the ‘90s?. Artif Intell Med 1993; 5 (02) 93-106
  • 8 Coiera EW. Artificial intelligence in medicine: the challenges ahead. J Am Med Inform Assoc 1996; 3 (06) 363-6
  • 9 Patel VL, Shortliffe EH, Stefanelli M, Szolovits P, Berthold MR, Bellazzi R, Abu-Hanna A. The coming of age of artificial intelligence in medicine. Artif Intell Med 2009; 46 (01) 5-17
  • 10 Benber B, Lay K. Health secretary Matt Hancock endorses untested medical app. The Times, 17 September 2018, available at
  • 11 McCarthy J. Recursive Functions of Symbolic Expressions and Their Computation by Machine. Commun ACM 1960; 3 (04) 184-95
  • 12 Colmerauer A, Roussel P. The birth of Prolog. ACM SIGPLAN Notices 1993; 28 (03) 37
  • 13 Kulikowski C.A.. An Opening Chapter of the First Generation of Artificial Intelligence in Medicine: The First Rutgers AIM Workshop. Yearb Med Inform 2015; 10 (01) 227-33
  • 14 Shortliffe EH, Davis R, Axline SG, Buchanan BG, Green CC, Cohen SN. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Comput Biomed Res 1975; 8 (04) 303-20
  • 15 Sonnenberg FA, Hagerty CG, Kulikowski CA. An architecture for knowledge-based construction of decision models. Med Decis Making 1994; 14 (01) 27-39
  • 16 de Dombal FT, Leaper DJ, Staniland JR, McCann AP, Horrocks JC. Computer-aided diagnosis of acute abdominal pain. Br Med J 1972; 1 (2(5804)) 9-13
  • 17 Nykänen P, Chowdhury S, Wigertz O. Evaluation of decision support systems in medicine. Comput Methods Programs Biomed 1991; 34 (2/3) 229-38
  • 18 Wyatt JC, Spiegelhalter D. Evaluating medical expert systems: What to test and how?. Int J Med Inform 1990; 15: 205-17
  • 19 Clarke K, O’Moore R, Smeets R, Talmon J, Brender J, McNair P. et al. A Methodology for Evaluation of Knowledge-Based Systems in Medicine. In: van Bemmel J, McCray AT. editors Yearbook of Medical Informatics. 1995 p. 513-27
  • 20 Yu VL, Buchanan BG, Shortliffe EH, Wraith SM, Davis R, Scott AC. et al. Evaluating the performance of a computer-based consultant. Comput Programs Biomed 1979; 9 (01) 95-102
  • 21 van Gennip EM, Talmon JL, Bakker AR. ATIM, accompanying measure on the assessment of information technology in medicine. Comput Methods Programs Biomed 1994; 45 (1-2) 5-8
  • 22 Brender J. Methodology for constructive assessment of IT-based systems in an organisational context. Int J Med Inform 1999; 56: 67-86
  • 23 Nykänen P, Enning J, Talmon J. Inventory of validation approaches in selected health telematics projects. Int J Med Inform 1999; 56: 87-96
  • 24 van Gennip E, Lorenzi NM. Results of discussions at the IMIA WG 13 and 15 working conference. Int J Med Inform 1999; 56: 177-80
  • 25 Brender J. Handbook of evaluation methods for health informatics. Burlington, MA: Elsevier Academic Press 2006
  • 26 Friedman CP, Wyatt JC. Evaluation Methods in Medical Informatics. 2nd ed. New York: Springer 2006
  • 27 Karthaus V, Thygesen H, Egmont-Petersen M, Talmon J, Brender J, McNair P. User-requirements driven learning. Comput Methods Programs Biomed 1995; 48 (1-2) 39-44
  • 28 Agniel D, Kohane IS, Weber GM. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 2018; 361: k1479
  • 29 Nolan J, McNair P, Brender J. Factors influencing the transferability of medical decision support systems. Int J Biomed Comput 1991; 27 (01) 7-26
  • 30 Feigenbaum EA. editor Autoknowledge: from file server to knowledge servers. MEDINFO; 1986; Amsterdam: Elsevier Science Publishers BV 1986
  • 31 Nykänen P, Zvarova J. Big data challenges for personalised medicine. Editorial. International Journal of Biomedicine and Healthcare 2015; 3 (01) 1
  • 32 Ammenwerth E, Shaw N. Bad health informatics can kill - is evaluation the answer?. Methods Inf Med 2005; 44: 1-3
  • 33 DeDeo S. Wrong side of the tracks: Big Data and Protected Categories. arXiv preprint arXiv:14124643 2014
  • 34 Bray BD, Steventon A. What have we learnt after 15 years of research into the ‘weekend effect’?. BMJ Qual Saf 2017; 26 (08) 607-10
  • 35 Sandvig C, Hamilton K, Karahalios K, Langbort C. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry 2014; p. 1-23
  • 36 Crouch H. RCGP chair says GPs are not ‘technophobic dinosaurs’. Digital Health 2018 [Available from:
  • 37 Lehmann HP, Downs SM. Desiderata for sharable computable biomedical knowledge for learning health systems. Learn Health Syst 2018; e10065
  • 38 Bhattacharya S, Czejdo B, Agrawal R, Erdemir E, Gokaraju B. editors Open Source Platforms and Frameworks for Artificial Intelligence and Machine Learning. SoutheastCon 2018 2018 19-22 April 2018
  • 39 Osoba OA, Welser IV W. An intelligence in our image: The risks of bias and errors in artificial intelligence. Rand Corporation 2017
  • 40 Burnett S, Franklin BD, Moorthy K, Cooke MW, Vincent C. How reliable are clinical systems in the UK NHS? A study of seven NHS organisations. BMJ Qual Saf 2012; 21 (06) 466-72
  • 41 Ser G, Robertson A, Sheikh A. A qualitative exploration of workarounds related to the implementation of national electronic health records in early adopter mental health hospitals. PLoS One 2014; 9 (01) e77669
  • 42 Lyell D, Magrabi F, Raban MZ, Pont LG, Baysari MT, Day RO. et al. Automation bias in electronic prescribing. BMC Med Inform Decis Mak 2017; 17 (01) 28
  • 43 Lyell D, Magrabi F, Coiera E. Reduced Verification of Medication Alerts Increases Prescribing Errors. Appl Clin Inform 2019; 10 (01) 66-76
  • 44 NHS code of conduct for data-driven health and care technology, 19 February 2019
  • 45 Minne L, Eslami S, de Keizer N, de Jonge E, de Rooij SE, Abu-Hanna A. Effect of changes over time in the performance of a customized SAPS-II model on the quality of care assessment. Intensive Care Med 2012; 38 (01) 40-6
  • 46 Treleaven P, Galas M, Lalchand V. Algorithmic trading review. Commun ACM 2013; 56 (11) 76-85
  • 47 Crouch H. East and North Herts could face £7m bill to fix Lorenzo issue: Digital Health. 2018 [Available from:
  • 48 Friedman C, Rigby M. Conceptualising and creating a global learning health system. Int J Med Inform 2013; 82 (04) e63-71
  • 49 Nykänen P, Brender J, Talmon J, de Keizer NF, Rigby M, Beuscart-Zephir M. et al. Guideline for good evaluation practice in health informatics (GEP-HI). Int J Med Inform 2011; 80 (12) 815-27
  • 50 Talmon J, Ammenwerth E, Brender J, de Keizer N, Nykanen P, Rigby M. STARE-HI -statement on reporting of evaluation studies in health informatics. Yearb Med Inform 2009; 23-31
  • 51 Brender J, Talmon J, de Keizer N, Nykanen P, Rigby M, Ammenwerth E. STARE-HI - Statement on Reporting of Evaluation Studies in Health Informatics: explanation and elaboration. Appl Clin Inform 2013; 4 (03) 331-58
  • 52 Schloemer T, Schröder-Bäck P. Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis. Implement Sci 2018; 13 (01) 88
  • 53 Hyppönen H, Faxvaag A, Gilstad H, Hardardottir GA, Jerlvall L, Kangas M. et al. Nordic eHealth Indicators: Organisation of research, first results and the plan for the future [Internet]. Copenhagen: Nordic Council of Ministers; 2013. Available from:
  • 54 Canada Health Infoway Benefits Evaluation Indicators Technical Report version 2.0. 2012
  • 55 DeLone WH, McLean ER. Information systems success: The quest for the dependent variable. Inf Syst Res 1992; 3 (01) 60-95
  • 56 DeLone WH, McLean ER. The DeLone and McLean Model of Information Systems Success: A Ten-Year Update. J Manage Inf Syst 2003; 19 (04) 9-30
  • 57 Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC:
  • 58 Artificial Intelligence for Health and Health Care. JSR-17-Task-002, JASON, The MITRE Corporation 2017
  • 59 Draft OECD Guide for Measuring ICTs in the Health Sector, Paris: OECD. COM/DELSA/DSTI(2013)3/FINAL; available at: 2013
  • 60 Adler-Milstein J, Ronchi E, Cohen GR, Winn LA, Jh AK. Benchmarking health IT among OECD countries: better data for better policy. J Am Med Inform Assoc 2014; 21 (01) 111-6
  • 61 Codagnone C, Lupiañez-Villanueva F. Benchmarking Deployment of eHealth among General Practitioners Final report. 2013
  • 62 Lannquist Y. Ethical & Policy Risks of Artificial Intelligence in Healthcare. The Future Society 2018