CC BY-NC-ND 4.0 · Yearb Med Inform 2019; 28(01): 257-262
DOI: 10.1055/s-0039-1677891
History of Medical Informatics
Georg Thieme Verlag KG Stuttgart

Artificial Intelligence in Medicine: Weighing the Accomplishments, Hype, and Promise

Edward H. Shortliffe
1  Department of Biomedical Informatics, Columbia University
2  College of Health Solutions, Arizona State University
3  Department of Health Policy and Research, Weill Cornell Medical College
› Author Affiliations
Further Information

Correspondence to

E. H. Shortliffe
272 W 107th St Apt 5B, New York, New York 10025-7833
Phone: Cell: +1-917-640-0933   
Email: [email protected]   

Publication History

Publication Date:
25 April 2019 (online)



Introduction: Artificial Intelligence in Medicine (AIM) research is now 50 years old, having made great progress that has tracked the corresponding evolution of computer science, hardware technology, communications, and biomedicine. Characterized as being in its “adolescence” at an international meeting in 1991, and as “coming of age” at another meeting in 2007, the AIM field is now more visible and influential than ever before, paralleling the enthusiasm and accomplishments of artificial intelligence (AI) more generally.

Objectives: This article summarizes some of that AIM history, providing an update on the status of the field as it enters its second half-century. It acknowledges the failure of AI, including AIM, to live up to early predictions of its likely capabilities and impact.

Methods: The paper reviews and assesses the early history of the AIM field, referring to the conclusions of papers based on the meetings in 1991 and 2007, and analyzing the subsequent evolution of AIM.

Conclusion: We must be cautious in assessing the speed at which further progress will be made, despite today’s wild predictions in the press and large investments by industry, including in health care. The inherent complexity of medicine and of clinical care necessitates that we address issues of usability, workflow, transparency, safety, and formal clinical trials. These requirements contribute to an ongoing research agenda that means academic AIM research will continue to be vibrant while having new opportunities for more interactions with industry.


1 Introduction

The headlines abound: “AI in healthcare market to grow to US$36.1B by 2025” [1], “6 ways AI is changing healthcare” [2], “Artificial Intelligence Transforms the Future of Medicine” [3], “AI Shows Promise Assisting Physicians” [4]. A drive up California’s Bayshore Freeway to San Francisco currently reveals multiple billboards for Silicon Valley commuters, many touting AI applications and tools from major corporations and startups (many of which are health-related), while others indicate the high-paying job opportunities for people with training in AI and data analytics. We talk to our smart phones and to virtual assistants in our homes and cars, while radiologists dictate into computer systems that generate text reports automatically. Futurists and medical pundits envision a new world of “high performance medicine”, resulting from the convergence of human and artificial intelligence [1].

For those of us who have been engaged with research on artificial intelligence in medicine and health care for decades, the current visibility and enthusiasm regarding our field is both refreshing and frightening. It is refreshing for those who worked hard to create systems that would demonstrate the promise of the field – often with pushback from the medical community, who could not initially envision the use of computers in health care that has gradually become commonplace. But it is frightening for those of us who remember the overselling of AI in general, including medical AI, in the early 1970s, late 1980s, and early 1990s. The hype in the press in the 1980s led first to societal curiosity, then to unfounded predictions by observers, and ultimately to the phenomenon of “AI winter” that followed when predictions proved to be overly optimistic. With a drop in research funding for anything related explicitly to AI, there was a tendency for those working in the field to select new terms for the science they were pursuing. Knowledge representation researchers developed ontologies and the machine learning community gathered at meetings called knowledge discovery in databases (KDD).

At the 1991 AI in Medicine Europe (AIME) meeting, the organizers asked me to present an assessment of the current state of AI in Medicine (often shortened to AIM). I characterized the field as being in its adolescence, with some solid accomplishments but many challenges, coupled with misunderstandings about the nature and goals of the field [2]. Subsequently, at the 2007 AIME meeting, several of us convened an invitational panel to consider progress in the field in the intervening years, later writing up our comments to convey our sense that AIM was coming of age [3]. Although we acknowledged the unfortunate impact of the overselling of AIM in earlier years, we made the case that the field was actually healthy, growing, and poised to have greater visibility and impact. We pointed to the importance of the increasingly available computational power, ubiquity of computing, networking and interconnectivity, and an environment for health professionals and the public that was more accepting of the notion that computers could play an increasingly valuable support role in health and health care.

In this article, I will offer a Part 3 to this series of papers, observing how another decade has affected our field, addressing the challenges in promoting what is possible while attempting to rein in unrealistic expectations, and anticipating what lies ahead, both for AIM and for the biomedical informatics discipline to which it contributes. I begin with a brief recapitulation of the history of AIM, noting that we provided more detail in those two earlier papers [2] [3]. I also identify key issues that characterize AIM research as we enter the next decade in the evolution of medical AI.


2 The Beginnings of AI and its Relevance to Medicine

The term “artificial intelligence” was coined by John McCarthy in a proposal for a computer science conference that was held at Dartmouth College in 1956[5]. McCarthy was then at the Massachusetts Institute of Technology (MIT) but went on to spend most of his career on the computer science faculty at Stanford University. A web site maintained by the Association for the Advancement of Artificial Intelligence (AAAI) nicely summarizes chronologically the subsequent history of the AI field[6]. In the years that followed that conference, several universities developed research programs in AI, often with support from the Defense Advanced Research Projects Agency (DARPA) in the US Department of Defense. Early centers of excellence emerged at MIT, Stanford, and Carnegie Mellon University, and there were important contributions underway at several other universities as well. The phrase “machine intelligence” was used in the UK, where there were also important research activities. Although much of the early work focused on generalized methods for simulating or reproducing human-like intelligence, by the late 1960s major efforts began to examine how AI methods could be used to tackle specific problem areas in the natural sciences. At Stanford, the DENDRAL Project of geneticist Joshua Lederberg and computer scientist Edward Feigenbaum, in collaboration with organic chemist Carl Djerassi and philosopher-of-science Bruce Buchanan, used production rules to encode the knowledge of expert chemists. They also applied clever algorithms that inferred likely chemical structures of organic compounds from mass-spectral data [4].

Lederberg subsequently led the effort to obtain funding from the National Institutes of Health (NIH) for a shared computer system at Stanford that was to be connected to the still-nascent ARPAnet (predecessor to today’s Internet) and would provide a resource for a community of individuals undertaking research in the application of AI in the biomedical and clinical sciences. The resulting SUMEX-AIM computer, and a second networked shared resource at Rutgers University, coordinated to offer services to AIM researchers, initially within the US and later internationally. Regular national AIM workshops facilitated collaborations and the formation of a community that came to know each other’s work well. Much of the AIM community’s work was summarized in a 1978 special issue of the journal Artificial Intelligence [5], in a 1984 book on major early research projects [6], and in a 1980 NIH publication describing federal investments in the area [7]. My own dissertation work on the MYCIN System [8] [9] would have been impossible without Lederberg’s leadership [10], Buchanan’s guidance, and the creation of the SUMEX-AIM resource on which I carried out my research.

Medical AI projects were responsible in part for the explosion of public interest in expert systems by the early 1980s. New computing machines were developed to support such work, largely built to optimize the LISP programming language. Major companies formed AI research groups and built academic collaborations to explore how the new technologies might stimulate innovation in their businesses. National and international news outlets ran front-page stories highlighting AI and expert systems, often predicting rapid changes that would revolutionize society, with health care often a central focus of the articles. As the decade progressed, however, it became clear that the aspirations and predictions were not yet achieved, with progress that was slow and projects that were constrained by the complexity of what was being undertaken. This disillusionment led to articles about “the failure of AI and expert systems” (AI winter), tainting the reputation of the field even as slow and steady progress continued to be made[7].

By the time of the previously mentioned AIME meeting in Maastricht (1991), the global AIM research community was active and vibrant, but somewhat sidelined and reluctant to characterize what they were doing as AI or expert systems research. Introspection on an international AIM list server led to my summary of, and responses to, key questions being asked in the community:

  • Have we been so focused on decision-making performance that we have failed to address user needs? [Response: The best work had not failed to consider these issues.]

  • Can we identify clear contributions of AIM to the general discipline of AI, to psychology, to clinical medicine? [Response: Yes, for the first two, but impact on clinical medicine was still to be achieved.]

  • Should we view AIM as part of information systems, computer science, AI itself, engineering, or biomedicine? [Response: None of the above. It is a key component of biomedical informatics, which is itself a separate hybrid discipline.]

  • Are we training AIM scientists well? [Response: Only in a few places at that time.]

  • Is there inbreeding or a limitation of perspective? [Response: Should not be a problem as the field grows and prospers, which has subsequently been borne out.]

  • Why does there appear to be a lack of studies of AIM systems in routine use? [Response: The problems are difficult, systems can adversely affect workflow, and it is especially hard to introduce systems smoothly in the absence of standards that support integration.]

  • Given the potential importance of the field, why is it poorly funded by research agencies? [Response: Getting funding can always be challenging, and AI had come into some disfavor, but excellent work was still being supported.]

  • Why are our systems exceedingly difficult to move from one site to another? [Response: Many reasons, including lack of sensitivity to cultural and process differences across institutions, absence of standard terminologies or connectivity standards, and “not invented here” biases.]

These questions, and my more detailed answers to them, became the focus of my 1991 AIME presentation and of the subsequent published version to which readers may refer for a then-current (and generally upbeat) analysis [2].

The next 15 years saw remarkable changes both in technology and in the capabilities of AIM systems. Machine learning had become a significant focus of AIM research [11] [12], and the importance of integration with clinical systems (facilitated by the gradual development of pertinent standards) had become well recognized. The biological sciences had embraced AI notions, spurred on by the size of their datasets and the success of the Human Genome Project in the 1990s. Meanwhile, wireless networking, smart phones, and social media had all contributed to a societal acceptance of computing that had been less prevalent even a decade earlier. When we gathered for the previously mentioned AIME session in Amsterdam (2007), organizers asked the five panelists to address how the field had advanced since 1991. They also wanted to know the extent to which it was influencing biomedicine or clinical fields, and how well it was being supported by funding agencies, by academic or research entities, and by colleagues in computer science or biomedical fields. Those perspectives are summarized in what I have called Part 2 in this series of papers [3]. In my own comments I stressed that AIM could not be set off from the rest of biomedical informatics (as it sometimes had been), and that it was also highly pertinent to the world of health planning and policy. But a major point was my emphasis on the importance of training a cadre of individuals who are deeply skilled in computer science and AI, but also knowledgeable about, and comfortable in, the life science research laboratory and the world of clinical medicine and practice.


3 Evolution of AIM in the Last Decade

A half-century has passed since the earliest explorations of AI in the life sciences and medicine. It is accordingly not surprising that a great deal has changed since the earliest notions were formed regarding the field and its potential. For example, there was an early debate as to whether AI should attempt to understand and simulate human reasoning processes, using methods such as formal logic that were provably correct. Those who extolled this view were dubbed neats, whereas the scruffies were those who were more concerned with human-like performance, regardless of whether the underlying methods were elegant or were homogenous in the way that they represented and demonstrated the notion of intelligence[8]. Although this distinction continues to exist among AI researchers, the scruffy approach now dominates, and much of the intelligent performance we observe in computers today has little to do with the way that human beings would tackle the same problem. AI has taken on a kind of magical quality, with most people impressed by how smart computers appear to have become but having little insight into how they achieve what they do.

Consider, for example, the long history of AI research in natural language processing and machine learning that led to the 2011 television performance of IBM’s Watson when it won a Jeopardy competition against two human game champions [13]. People had vague notions that Watson knew a lot, and was very fast, but the long line of research that had led to such performance is not well appreciated. As early as 1959, Arthur Samuel (at IBM) began his exploration of machine learning that resulted in a computer that was able to play checkers, to learn how to win by playing thousands of games against itself after it knew the rules, and eventually to beat human competitors[9]. The perceptron predecessor to neural networks, viewed as a method for recognizing patterns by simulating brain neuronal interconnectivity, was first explored in the 1950s and later popularized in a 1969 book by Minsky and Papert [14], long before neural networks with multiple “deep” hidden layers would have been computationally tractable. Research on natural language processing and text mining also has a long history[10].

We can make similar observations about our everyday acceptance of the speech-understanding capabilities of Siri or Alexa, both of which owe their performance to decades of AI investigation, such as the influential Hearsay effort at Carnegie Mellon University in the 1960s and 1970s [15]. And when I first arrived at Stanford University in 1970, I was excited when I visited Stanford Research Institute (now known as SRI International) to see the performance of an early wheeled intelligent robot, Shakey, as it solved problems regarding mobility and performed physical tasks[11]. Later, on my first visit to the Stanford AI Laboratory, nestled on a hill above the campus, I drove in and was intrigued to see a sign “Caution: Robot Vehicle”. Researchers had begun a line of research on driverless vehicles that has continued for decades[12] and has contributed to the current notion that our Uber may soon arrive without a driver in the front seat[13].

I mention these early AI research activities to stress that what we embrace today did not suddenly appear from a commercial company but was the result of decades of research, typically with slow progress and a visionary set of scientists and government funding agencies who advanced the field. People chatting with intelligent agents in their phones have little insight into what was required to create the technology that they now take for granted. And in the popular media, the phrase “AI” has transformed from being the name of a computer-science subfield into a noun used to describe a sentient but digital being – “there is an AI in my phone!”.

There is, of course, also a medical AI version of this story. It is caught up in a confusion of terms that have recently come into vogue. Researchers in AIM, and in biomedical informatics in general, have long struggled with large amounts of data that have pushed the boundaries of what can reasonably be tackled with existing technology. Similarly, they have always developed analytical methods for dealing with such data, combining statistical and computational techniques. Thus “big data” (where “big” is a relative term, reflecting evolution in computing capacity and technology) have always characterized our field. We have always embraced the need for “data analytics” and have developed corresponding methods. However, in recent years, notions of data science have emerged as though they are new and have not always been with us. Enthusiasm for new deep machine-learning methods and data science has often failed to acknowledge that those topics are part of a continuum that has been explored by computer scientists, biomedical informaticians, and AIM researchers for decades. In some circles, machine learning and artificial intelligence are seen as nearly synonymous, and it is unfortunate that both academic and government institutions are increasingly creating organizational structures suggesting that biomedical data science and biomedical informatics are separate disciplines or activities.

As previously mentioned, AIM is naturally entwined with other areas of computer science and biomedical informatics. Although there is no chapter on AIM in our textbook of biomedical informatics [16], the field arises throughout the volume in almost every chapter, e.g., those dealing with decision science, cognitive science, human-computer interaction, bioinformatics, natural language processing, decision-support systems, imaging, and the like. As editor of the Journal of Biomedical Informatics (JBI), I also have found that a very high percentage of current submissions deal with AIM topics, and especially with NLP and machine learning applied to health care problems. Many are novel applications of existing techniques[14], but some papers introduce new methods motivated by medical problems, and many of these generalize broadly for diverse applications, including outside of biomedicine and health. Similar phenomena are evident when I peruse other informatics journals, leading to my assertion that AIM-related research is a core element in current informatics investigative activities.

Extensive AIM research activity is now, for the first time, matched by enthusiasm for the field[15] and by AI investment by health care organizations and in industry, including new startup companies. Top medical journals have published editorials or articles extolling the promise of AI [17] [18], and the press is filled with articles on the topic[16]. Expectations are high, and the public seems more accepting of the notion that AI will play a role in the care that they receive.

There are cautionary notes as well. For example, some fear that AI could worsen health disparities [19], and some argue that AI is simply being overhyped and has a long way to go [20]. Some of us fear another AI winter could set in, but on balance the field is proving itself with formal studies as well as anecdotal stories.


4 Taking a Deeper Look

Today the clinical literature is replete with formal studies demonstrating the quality and value of AI methods, especially in the interpretation of large datasets, typically using deep learning approaches. Perhaps the most visible successes are those dealing with image interpretation – radiographs [21], ultrasound scans [22], photographs [23], skin lesions [24], microbiology [25], and the like. The results of some studies have been so impressive that there is even debate in the radiological community about whether radiologists will be replaced by AI systems[17]. I find this unlikely, but the nature of radiologic work and expertise will evolve.

The use of deep-learning techniques for image interpretation, and their gradual insertion into routine use, are similar to what the medical world has already accepted in the area of electrocardiographic (ECG) interpretation. In much of the world, ECG reports are now generated by computers and then checked and approved by a cardiologist. The ordering physicians can still review the ECG themselves and decide whether they agree with the interpretation, but there is broad recognition that it is more efficient, and generally just as accurate (or more so, given issues of human fatigue and distractibility), to have a computer perform the initial analysis and to generate the report. The arguments would seem to be even stronger in the case of radiologic interpretation, where there is growing evidence that, on average, the AI interpretation will be more accurate. Of course, as in the ECG example, physician review and approval are likely to be required for the foreseeable future.

One issue that arises frequently in the setting of deep learning systems is their lack of transparency: the scoring algorithms cannot easily be translated into explanations regarding the basis for a program’s decision. Government research agencies realize that this is an ongoing issue[18], and, given the nature of medicine and the independence of clinicians, explanation has long been viewed as an important element in an AIM system. For programs that use machine learning methods to perform signal or image analysis, however, such explanations may be less pertinent. The ECG example shows that clinicians will accept such systems if they can independently validate the decision offered by a program.

A colleague and I have recently argued that the situation may be different when an AIM system is offering diagnostic or therapeutic advice through a direct interaction with the clinician [26]. We have known for decades that it is especially challenging to motivate clinicians to use a decision-support tool when they must interact with the program themselves (and not simply receive a computer-generated report). Although some colleagues have argued that explanation is not a crucial requirement, now that the use of computers is more clinically accepted, my own experience would suggest otherwise. I have found that clinicians use explanation capabilities less frequently as they became comfortable with an advisory tool, but they want to know that the capability exists in case they have questions. This observation led me to argue in the recent article [26] that “black boxes are unacceptable” when a decision-support tool is designed for direct use by a clinician.

The need for explanation was part of the reason why earlier AIM systems emphasized the “knowledge is power” concept [27], which for decades became a kind of mantra for the expert systems research community. I believe that the true power of our deep-learning applications will occur when we have better approaches for merging the analytical power of machine learning with an explicit encoding of the relevant domain semantics. It is one of the reasons that my recent article [26] stressed that “relevance and insight are essential”, explaining that a decision support system “should reflect an understanding of the pertinent domain and the kinds of questions with which clinicians are likely to want assistance.”

I also believe that we must counter all suggestions that AIM systems are being designed to make the therapeutic or management decisions. We must stress that it is the synergy between user and machine, and not the machine alone, that properly leverages the unique skills of both participants. It is through this kind of partnership that the care of patients will improve and flourish as AIM systems became routinely integrated into care settings.


5 Moving Forward

With new companies forming, new products being released, and established firms adding major AIM elements to their business strategy, one could ask how the AIM research community might best interface with these new kinds of activities. In the days when there was almost no commercial interest in medical AI, the work was largely based in academia, supported as research, and implemented and evaluated in affiliated clinical settings. Today there is an enhanced potential for partnerships between those in the AIM research community and those involved with the commercialization of the technology.

On the one hand, there should be opportunities for joint research projects, largely academically based, in which industry and academic AIM groups combine their unique skills and resources to develop new methods and systems. There is an existing model for this kind of collaboration in the computer-science community, where firms financially sponsor joint research, students, and even professorships. It is time to forge more such relationships between members of the biomedical informatics academic community and their counterparts in industry.

But there is another way in which the AIM community, and others in academic biomedical informatics, can work proactively and collaboratively with industry. We can take an example from the pharmaceutical industry, which long ago realized that the world of clinical medicine demands evidence that is rigorously developed and published in peer-reviewed journals. If such research were carried out by the company itself, there would be inevitable concerns about conflicts of interest that might taint the study design, its implementation, or the subsequent analysis. As a result, the pharmaceutical firms often collaborate with (and provide funding for) academic medical groups and groups of practitioners who take the lead in the implementation, data gathering, analysis, and publication of the results.

By analogy, the budding AIM industry needs to take seriously the importance of rigorous evaluation of their products and thus the importance of collaborations with independent groups, typically in academia, to carry out the relevant trials. In the case of clinical systems, this will mean industry helping to fund studies that establish a product’s safety, validity, reproducibility, usability, and reliability [26]. There is accordingly great synergy that matches the needs and interests of the AIM industry as it evolves and those of the academic AIM and biomedical informatics communities.

I close, as I often do, with an appeal for more training of talented individuals to work in both the academic and industry sides of this AIM research relationship. Due to a paucity of AIM graduates, too many companies have had to settle for hiring individuals with computer-science talent but limited understanding of the culture of medicine or the details of practice settings. The insertion of even one well-trained biomedical informatics scientist into such settings can radically influence the way a product develops, evaluations are performed, or corporate strategies are defined. We could debate the meaning of “well-trained”, but at a minimum I believe it implies solid technical skills in computer science and AI, coupled with a deep understanding of medicine and clinical care.

The future of AIM is bright, building on the remarkable transformation in technology, computing, medicine, and biology over the past half-century. Terminology may change, and experiments will continue to falter along the way, but consideration of the steady progress of five decades is inspiring, even if the rate of progress has been slower than we might have hoped or predicted. We should learn from that experience and avoid predictions today that are ill-considered or that fail to appreciate the complexity of the task that we have undertaken and that will continue to challenge us – often in ways that will surprise us.


1 (Accessed Feb 10 2019)

2 (Accessed Feb 10 2019)

3 (Accessed Feb 10 2019)

4 (Accessed Feb 12 2019)

5 (Accessed Feb 10 2019)

6 (Accessed Feb 10 2019)

7 (Accessed Feb 12 2019)

8 (Accessed Feb 12 2019)

9 (Accessed Feb 12 2019)

10 (Accessed Feb 12 2019)

11!&innovation=shakey-the-robot (Accessed Feb 12 2019)

12 (Accessed Feb 12 2019).

13 (Accessed Feb 12 2019).

14 Such papers do not match JBI’s editorial policies and we refer them elsewhere for possible publication.

15 (Accessed Feb 13 2019)

16 See, for example, the footnotes to the first paragraph in this paper.

17 “If you think AI will never replace radiologists—you may want to think again.” (Accessed Feb 13 2019)

17 “Why AI will not replace radiologists.” (Accessed Feb 13 2019).

18 (Accessed Feb 13 2019)

Correspondence to

E. H. Shortliffe
272 W 107th St Apt 5B, New York, New York 10025-7833
Phone: Cell: +1-917-640-0933   
Email: [email protected]