Keywords
Artificial Intelligence in medicine - medical decision-making - clinical knowledge representation - expert systems - knowledge engineering - scientific inquiry - cognitive and brain science
1 Artificial Intelligence (AI), Biomedicine, and Healthcare: an Abbreviated Historical Overview
1 Artificial Intelligence (AI), Biomedicine, and Healthcare: an Abbreviated Historical Overview
The history of early biomedical computing, including the first AI approaches, can be seen as a series of attempts to investigate, understand, and build computational models of the scientific knowledge and problem solving heuristics used by biomedical scientists, while also developing and testing computational systems for clinical data processing and interpretation, and modeling clinical reasoning in ways that were to go beyond the logical, statistical, and pattern recognition models for medical decision-making which had become popular, starting in the 1950’s [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]. The outcome of a first phase of AI in medicine research came to fruition by the mid-1970’s when the SUMEX-AIM time-sharing resource at Stanford University [23] coupled with a series of AI in medicine workshops initiated at Rutgers University [24] capitalized on research directions in the USA which converged over the next decade to a knowledge engineering paradigm [25] for designing expert systems [26]
[27]. This meant the widespread and worldwide development and adoption of heuristic problem-solving methods and rule-based systems for a wide range of fields beyond biomedicine, including the Japanese Fifth Generation Project [28]. Unfortunately, the excessive commercially-driven optimism that accompanied the premature generalization of knowledge-based systems, and the dramatic underestimate of the cost of developing, maintaining, keeping up-to-date, and ensuring reliable performance of expert knowledge-bases, contributed to a second “AI Winter” by the mid-to-end of the 1980’s [29]. The first AI Winter had followed the excessive enthusiasm for the initial generation of connectionist Artificial Neural Nets (ANNs) or Perceptrons, the theoretical limitations of which were exposed by Minsky in his 1968 book of that title [30], and underwhelming fulfillment of various promises of AI, including early automatic language translation systems as critiqued in the UK Lighthill Report [31].
As the second AI Winter loomed in the 1980’s, AI in medicine re-examined many of the statistical, as well as heuristic models for machine learning, pattern recognition, and discovery, also emphasizing models of explanation and description as ways of teaching about the assumptions behind the first-generation knowledge and rule-based systems [32]
[33]. Statistical and heuristic modeling classification and prediction approaches in turn contributed to data mining and knowledge discovery developments in AI starting in the 1990’s [34]. And, a scholarly synthesis of AI around the design of “intelligent agents” was epitomized by the still-largely-current encyclopedic book of Russell and Norvig [35] which combines classic search and game-oriented AI with logical reasoning representations and inference methods as well as critical discussions of the multitude of empirical heuristic problem solving approaches that incorporate lessons from knowledge engineering for a wide range of problems ranging from computer vision to speech recognition and textual analysis. Over the past two decades, a new “AI Boom” has developed, first with kernel methods of machine learning or Support Vector Machines (SVMs) and shortly afterwards focusing around Deep Learning through a new generation of “deeper” multi-layered connectionist ANNs [36].
Models of the underlying knowledge for both application domains like medicine and computational process representations have led to the development of many medical computational ontologies such as the Foundational Model of Anatomy [37], using general ontology-building frameworks such as Protégé [38]. The reconciliation of user-centered knowledge engineering requirements with formal theories such as description logics for medicine, as in GALEN [39], raised many practical issues for their wider deployment and use in connecting with electronic health records and other clinical documentation [40]. The development of ontologies relied on the long-term research and development in biomedical information retrieval, while indexing of the literature and the coding of documentation were early requirements for library automation. The pioneering work starting in the early 1960’s at the National Library of Medicine (NLM) in the USA was essential in developing MEDLARS (Medical Literature Analysis and Retrieval System) [41], its online successor MEDLINE, and its web-based search engine PubMed [42], accessing the world’s largest repository of biomedical literature PubMed Central. The NLM’s support for developing a Unified Medical Language System (UMLS) to capture and computationally represent medical terminologies and vocabularies [43] was a major contributor to the success of these efforts starting in the 1980’s. While not usually considered AI, the work of the NLM nevertheless provided the critical computational building blocks for augmenting intelligent discovery in biomedicine, and has been instrumental in accelerating biomedical research since that time. Meanwhile, on the AI side of scientific theory formation, most recently, proposals for largely Bayesian approaches for formalizing causal reasoning into a new type of causal science are the basis of a book which points out that current machine learning methods are barely at the lowest rung of a ladder for discovering causality in nature, highlighting the need to ascend much higher through an active experimentation as the integral part of the learning process, like it is in humans [44]. This would help generalize earlier efforts of AI in theory formation [2]
[45]. In the past two decades, the Human Genome Project has produced such an abundance of scientific data that helps elucidate inheritance patterns of disease that the project has resulted in yet more abundant multi-omic data sets and raises very considerable challenges about how to incorporate them into clinical practice as translational medicine begins to impact healthcare significantly [46]
[47]
[48].
2 First Generation of AI in Biomedicine
2 First Generation of AI in Biomedicine
AI in Medicine (AIM) arose in the 1970’s from new approaches for representing expert knowledge with computers, initially developed in the 1960’s by biomedical researchers Joshua Lederberg and Carl Djerassi, and AI researchers Edward Feigenbaum and Bruce Buchanan at Stanford University in the Heuristic Dendral Project [1]
[2]. The Dendral team work on the elucidation of molecular structures from mass-spectra was originally motivated by Lederberg’s interest in alien substances and species identification from the early space explorations of the time, and was directed towards scientific discovery and theory formation rather than clinical decision-making [49]. Earlier, starting in the 1950’s and through the 1960’s, however, there had been a parallel trend of studies in biomedical research inspired by Weiner’s cybernetics [50] and McCulloch and Pitts’ modeling of neural nets [51] – leading to European initiatives and conferences on Cybernetic Medicine [52]. These studies, however, did not go very far, due to the largely theoretical and speculative nature of the models proposed for complex problems of feedback control in biology and for learning in humans, which turned out to be both technologically and scientifically premature. Instead, clinical documentation and medical systems developed in both Europe and the USA proved to be the first computer-based experimental software systems that showed promise for routine clinical application in recording and analyzing clinical data, as demonstrated at the first international conference in Elsinore, Denmark, in 1966 [53]. At that point, AI researchers were concentrating on issues of search and general means-ends problem solving as in Newell’s GPS [54], demonstrating how to successfully solve game playing, as in the game of checkers by Samuel [55], while developing novel languages for problem-solving and list-processing such as IPL (Information Processing Language) and LISP. Such high-level logic approaches were not seen to usefully apply to the more complex, highly ambiguous, and open-ended problems with imprecisely-defined categorizations for goals of decision-making under considerable risk and uncertainty, such as those arising in medical diagnosis and treatment. Instead, as mentioned above, statistical approaches were the norm for medical data analysis and decision modeling. After the Ledley and Lusted paper appeared in Science in 1959 [4], the Bayesian paradigm provided the main modeling approach to clinical reasoning. Nevertheless, the clinical work of the time illustrated the promise of practical systems for clinical data gathering and analysis [13], and decision support which a number of books shortly afterwards discussed and summarized [14]
[19]
[21]. All these, like the Elsinore presentations and papers, emphasized a combination of practical computer-based systems for information processing, formal probabilistic models for medical reasoning, or mixes of the two. Software for supporting scientific biomedical investigations, meanwhile, tended to be extensions and scaled-up versions of either statistical methods of analysis for population data sets, or simulation models of biological mechanisms, often with medical applications for aiding in the interpretation of clinical data.
How AI came to be used for modeling medical problem solving originated from the notion that expertise and knowledge from specialists ought to be studied so as to model theory formation and problem solving with computational schemes. The clearest AI origins come from the work of Simon and Newell, whose economics, management, physics, and cognitive psychology backgrounds combined, led them to share a curiosity about how human behavior could be both modeled and helped by computers in understanding problem solving. Simon coined the phrase “Sciences of the Artificial” to summarize and describe the emerging field of AI in his famous Compton lectures at MIT in the spring of 1968 which were collected and published [56]. Newell and Simon’s collaborative contributions received the Turing Award in 1976, with their joint prize lecture representing a crisp distillation of their philosophy for AI [57]. In the 1960’s, Feigenbaum studied with Simon, and edited and contributed to a pioneering book on Computers and Thought [58]. When Feigenbaum moved from Carnegie Tech to Stanford, it is not surprising that he found fertile cross-pollination of his ideas about introducing explicit representations of heuristic expert knowledge with those of the Nobel Prize winner Joshua Lederberg, who was also interested in biological theory formation and scientific discovery. Feigenbaum also happened to be a friend of Saul Amarel, who was then directing the AI Lab at RCA’s Sarnoff Center in Princeton, and together they discussed and explored issues revolving around formalizations of human problem solving [59]. This intellectual rapport and friendship between Feigenbaum and Amarel proved to be a catalyst for discussions which came to the attention and stimulated the interest of Bill Raub at the US National Institute of Health’s Division of Research Resources, who was seeking new directions for biomedical research support with computational methods, including AI. A pilot Research Resource on Computers in Biomedicine was funded at Rutgers University under Amarel’s direction in 1971, and served to support research on problem solving approaches in the life sciences and psychology, as well as pattern recognition models of clinical decisions [60]. Shortly afterwards, in 1973, an inter-university resource using a time-shared computer system based at Stanford, called SUMEX-AIM (Stanford University Medical Experimental – AI in Medicine), was funded, supporting the computational infrastructure that brought together primarily researchers from Stanford, Rutgers, Pittsburgh, and Tufts-Harvard-MIT on the clinical side, and more in a range of biomedically-related research from other institutions [61]
[62]. This led to a vibrant exchange of ideas about novel AI approaches to biomedical problem solving and clinical decision-making which were debated in a series of AI in Medicine Workshops sponsored by the NIH, starting at Rutgers in 1975 [24]. The productive sharing and cross-fertilization of ideas between researchers in clinical medicine and AI were subsequently summarized in the book edited by Szolovits [63].
3 Clinical AI: Medical Consultation as the First Goal
3 Clinical AI: Medical Consultation as the First Goal
The clinical decision-making orientation of AI work had been earlier foreseen and advanced by Dr. William Schwartz from Tufts when he wrote a visionary paper in the New England Journal of Medicine in 1970 entitled “Medicine and the Computer: The Promise and Problems of Change” [64]. In this paper, he said: “Computing science will probably exert its major effects by augmenting and, in some cases, largely replacing the intellectual functions of the physician. As the “intellectual” use of the computer influences in a fundamental fashion the problems of both physician manpower and quality of medical care, it will also inevitably exact important social costs — psychologic, organizational, legal, economic, and technical. Only through consideration of such potential costs will it be possible to introduce the new technology in an effective and acceptable manner. To accomplish this goal will require new interactions among medicine, the information sciences and the management sciences, and the development of new skills and attitudes on the part of policy-makers in the health-care system.” Schwartz in this way anticipated many of the difficult social and professional issues that confronted the introduction of computers into medical practice, most especially for clinical decision-making. He was familiar with the work of his neighbors at Harvard and MIT – the collaboration between Octo Barnett and Tony Gorry, who were investigating the computational modeling of sequential medical decisions with decision-analytic utility theory [20]. At around this time, Bob Greenes was at Harvard pursuing a post MD-PhD in Barnett’s Laboratory for Computer Science at Massachusetts General Hospital, where he serendipitously connected with the young physician Ted Shortliffe and supervised his Honors Thesis at Harvard on computer-based patient-physician interactions [65]. When Shortliffe moved to Stanford for his PhD studies, he met and worked with Bruce Buchanan, whose research on computational logic and modeling had been central to Dendral’s rule-based representation of mass-spectrometry data and its interpretation. Together they sought a generalization of the expert rule-based approach of Dendral to clinical problems in collaboration with Stanley Cohen who was working on avoiding deleterious drug interactions, which dovetailed well with Shortliffe’s medical background and expertise [66], and related to the NIH’s interest in the medical impact of its funded research. These collaborations led to the development of the rule-based system MYCIN [67]
[68] for advising on antimicrobial therapies for infectious diseases. It developed and used a highly original confidence-factor representation to measure clinical uncertainty [69]. While it was shown later that confidence factors could formally map into probability models, their psychological impact for the acceptance of the consultation program for infectious diseases in MYCIN was significant. MYCIN was the most influential expert system that demonstrated the power of modularized rules for representing decision-making that was later generalized as a framework for developing rule-based systems called EMYCIN.
At Rutgers, we were fortunate in enlisting the collaboration of Aran Safir of the Mount Sinai School of Medicine, an ophthalmologist and inventor of medical instruments. I had been working with Safir on analyzing the precision and accuracy of data from his Ophthalmetron – a pioneering digital tomographic refractometer – which was being tested on students in New York City [70]. Following my own dissertation on pattern recognition subspace methods for the diagnosis of thyroid dysfunction [22], I had joined Rutgers as a young assistant professor and my first doctoral student, Sholom Weiss, worked with me to explore ways in which prior knowledge from the physician could be used to improve and explain results from computer decision models [71]
[72]. In seeking to overcome the difficulties of explaining probabilistic reasoning, we sought out ways for understanding clinical decision processes and struck on the notion of representing causal explanations of disease mechanisms that could computationally generate both the natural course and the treated course of diseases. Safir suggested that we try it out on the glaucomas – the group of eye diseases which lead to blindness as a result of excessive intra-ocular pressure restricting blood flow to the retina. After presenting a prototype at the American Research in Vision and Ophthalmology (ARVO) meeting in Sarasota in 1973, we were able to interest leading specialists in glaucoma, including Dr. Bernard Becker of Washington University in St. Louis, and Dr. Irving Pollack at Johns Hopkins in providing their expertise in the development of what became known as the CASNET (for CAusal Associational Network) consultation program for glaucoma [73]
[74]. The program showed how causal explanations of disease could be combined with empirical knowledge of presumptive diagnoses, prognoses, and treatments to provide advice on glaucoma patient management. CASNET was tested successfully before a large audience at the Academy of Ophthalmology in Las Vegas in 1976 [75].
At the University of Pittsburgh another line of research in modeling the knowledge supporting clinical decision-making and the use of inference was underway through the collaboration between a leading internal medicine specialist, Dr. Jack Myers, and AI researcher Harry Pople. Pople had proposed an abductive model for clinical reasoning [76], and with Myers and Dr. Randall Miller, they developed a taxonomic and causal model of diseases [77] based primarily on Dr. Myers’ knowledge and expertise in internal medicine. The model and a prototype program, tested initially on grand rounds clinical case descriptions from the New England Journal of Medicine (NEJM), was first called DIALOG then CADUCEUS and INTERNIST. While MYCIN and CASNET covered related sets of diseases, INTERNIST covered all of internal medicine, and required many years of development to capture the wide scope of heterogeneous knowledge, heuristic measures of confidence, and importance of clinical findings as they related to a large number of potential diagnoses. It eventually evolved into the microprocessor-based internal medical reference system QMR [78].
In New England, Dr. Steven Pauker from Tufts joined with Peter Szolovits of MIT (after Tony Gorry left for Rice) to develop the Present Illness Program which explored how patient findings in a presenting illness were typically obtained by clinicians in a sequential question-answering process. They were inspired by both cognitive and decision analytic models to develop an interactive consultation program using elements of categorical and probabilistic reasoning [79] which subsequently led to further studies of causality used in the modeling of disease processes [80].
Despite the development of very successful prototype systems, the AI in Medicine focus on investigating and modeling medical consultation showed that, however intellectually interesting, most physicians were not ready to use these systems in clinical practice. Involved contributing factors included the pressure and lack of time that most clinicians had to engage with a computer, and the great effort needed to keep the knowledge bases updated and current with the relevant science and clinical practices. As a result, most expert systems were largely used as explanatory tools for medical education [81].
4 Conclusion: History, Science, Technology, and Clinical Practice Related to AI and Biomedical and Health Informatics
4 Conclusion: History, Science, Technology, and Clinical Practice Related to AI and Biomedical and Health Informatics
From a historical perspective, it is somewhat premature to call what we are now writing “history”, since many of us who contributed to the beginnings of AI in medicine are still active, and can at best write about their personal reflections on the development of the field, as I do with this paper, rather than a more detached story and long-term assessment of how ideas, systems, and their impact have changed over the years. So, while uncovering longer-term patterns of human-technology symbiotic interactions in computing and related technological developments over the past 50 years may be unrealistic, these technologies have so dramatically revolutionized scientific discovery and medical practice, that it is not unreasonable to suggest that a major paradigm shift a la Kuhn has occurred [82].
A major informatics factor driving biomedical advances is the world-wide dissemination and availability of the biomedical information from the literature, largely collected, digitized, and made retrievable through the NLM’s PubMed [42]. Biomedical AI has benefited from these developments considerably, and the now-universal availability of large corpora of biomedical texts and journals has made it imperative to develop better natural language processing (NLP) algorithms for automatically classifying and interpreting the vast amount of data and knowledge that is contained online and on the web. This, however, raises deep issues that have been central to AI but which have proven difficult to deal with. Understanding complete texts, as opposed to just “text mining” to find and retrieve articles by words or textual fragments (such as keywords or named topics) has made progress but remains an open problem for science, and for AI, because our perceptual, cognitive, linguistic, and mental models of what constitutes human understanding are still very inadequate. Cognitive science approaches have been investigated since the early 2000’s [83]
[84], but scientific insights into shared conscious understanding about our very faculties of understanding still await breakthroughs necessary beyond current research in the cognitive neuroscience of memory, for instance [85], since it also needs to be considered in the context of how languages, consciousness, and cultures are related [86]. Brain science is advancing, but still entirely new paradigms are needed to model and better understand, for instance, the functioning of glial networks that are so significant in interacting with, and modulating neural networks [87]. From a linguistic perspective, the role of perceived images and mental constructs of the sensed world, and their relation to beliefs in science, metaphorical expression, their mathematical modeling assumptions, and descriptions by shared logic and language present very deep challenges [88]
[89], and raise critical issues about the role of descriptions, visualizations, and narratives in reconstructing our memories and mental models of the world [90] as well as the shared foundations between artistic creativity and brain science [91].
Early AI models for clinical reasoning using rule-based, causal, hierarchical, and associational representations of clinical knowledge were so innovative that they inspired a whole school of heuristic knowledge-based AI and many other AI applications. Subsequent attempts to connect these early knowledge engineering approaches and relate them to approaches from exploratory statistical data analysis and inference, information retrieval, machine learning, and computer vision are ongoing, and have been transformed radically since the advent of the World Wide Web. The up-scaling of computer data acquisition related to multiple human senses (especially vision, sound, and touch) makes their interpretation by humans using machines and the interconnected web of the Internet of Things a central challenge at the interface of intelligent humans and intelligent agents or artifacts invented through our artifice [92]. In biomedicine, novel instrumentation with automated data acquisition from nanoscale to population scale observations increasingly leverages current methods of machine learning, computer vision, and other modalities to help transform scientific biomedical inquiry [93]. However, for translating biomedical insights into clinical practice, essential for the precision medicine of the future, serious challenges of personalization arise, not only related to the scientific complexities of genotype-phenotype mappings, but also equally or more importantly, to the very different responsible professional roles of “intelligent agents” involved in the treatment of patients. Deep underlying issues arise involving how AI can contribute responsibly and ethically to the personalization of healthcare, which presents very different human clinical problems when dealing with individual patient care as work on narrative medicine illustrates [94]
[95], compared to recommending directions for computational guidance of scientific inquiry and discovery which are at the center of biomedical research, or to adopting business strategies for healthcare enterprises primarily influenced by economic goals.
We all tell each other stories to describe and complain about our ailments, and it is not unreasonable to conjecture that this has been happening since well before the time of recorded history. The foundations of western medicine come to us from Ancient Greece and the works of Asclepius and Hippocrates, recommending “natural” treatments of physical exercise and nutrition to maintain the balance between the body and the environment in a preventive way [96]. The admonition that is well-known for treating all physical ailments and trauma in a way that is most informed and balances active treatment with avoidance of possible harmful effects can be found in Hippocrates’ work “Of the Epidemics” where he narrates how many patients developed illnesses and provides information and rationales for his treatments [97]. The Hippocratic Oath that physicians take usually derives from a Latin translation “Primum, Non Nocere” or “First, do no harm” but is still the subject of much argument and debate as to whether this is really what Hippocrates meant, since another translation is cited as: “The physician must be able to tell the antecedents, know the present, and foretell the future — must mediate these things, and have two special objects in view with regard to disease, namely, to do good or to do no harm.”[98]. A more detailed and nuanced discussion of the issues involved in requiring an acknowledgement of personal responsibility by a physician taking care of an individual patient can be found in [99] since the Hippocratic Oath, which has been such a long-standing criterion for relating the practice of medicine to the ideals and principles for treating the suffering patient going back over 2000 years, is now finding these principled criteria challenged by the uncertainties in the new roles of physicians and nurses within group practices, clinics, and hospitals, where shared and delegated responsibilities are frequently not clearly defined. EHR-evidence-based medicine can additionally contribute to a diffusion of responsibilities from individuals to “systems” which can have extremely damaging effects on patients as the result of disruptive effects on clinical practice in the rapidly changing world of IT-influenced, transaction-oriented, and bureaucratized health care practices [100]. Models for the introduction of technologies in health care have been proposed [101] and the possibility that recurrent cycles of information technology improvements might help reduce potential harmful effects of IT disruptions has been discussed in the informatics literature [102].
Most recently, Coeira et al. have addressed these types of problems related to the introduction of AI, specifically in an opinion piece published in the British Medical Journal Opinion Online [103], where they state that: “We will need new principles and regulations to govern medical artificial intelligence”, supported by a most compelling set of examples, such as one referring to end-of-life decisions, pointing out that: “The notion of “doing no harm” is stretched further when an AI must choose between patient and societal benefit. We thus need to develop broad principles to govern the design, creation, and use of AI in healthcare. These principles should encompass the three domains of technology, IT users, and the way in which both interact in the (socio-technical) health system.” Later in the article they make a crucial point about the current ethically and practically problematic issues with dependence on artificial neural network models for machine learning in data-driven medical systems: “explanation is challenging for AIs based on current-generation neural networks, because knowledge is no longer explicit, but rather is non-transparently encoded in the connections between “neurons”.” We can conjecture then that a possible useful direction for new AI research for biomedicine could entail investigations in how to combine the explanatory power of methods deployed in some of the early causal-mechanism and rule-based AI expert systems, with the new computational architectures that have strong inferencing power as is being promised by recent neuromorphic asynchronous spiking neural network (SNN) chips [104]. The detailed “empirical epistemology”, or AI methods implemented to synthesize the kinds of top-down model-to-data reasoning and the new and more powerful data-to-model inferences and reasoning will present more than enough challenges to be reconciled or made compatible with the exercise of individual responsibility following ethical principles and constraints.
To ensure that AI amplifies, rather than replaces or distorts, human ethical judgment is a central conundrum facing medical AI researchers and practitioners. Discovering how to balance the “calculating brain” of humans driven by selfish and economic imperatives with the “altruistic brain” of those clinicians who want to keep honoring their Hippocratic Oath involves a wide range of hard choices and needs insights that should keep biomedical informatics researchers busy, awake, and hyper-conscious of their deepest obligations to help patients and practitioners live up to not only the latter’s Oath, but also to what the founder of cybernetics, Norbert Weiner, so presciently identified as the major challenge of complex human-machine systems in his book entitled “The Human Use of Human Beings” [105]. Whether a good ethical human can work with an AI and remain ethical is a major open problem for all of us that will have to be confronted not only scientifically, but also in a socially acceptable and humanistic way in clinical informatics. “Cui Bono” suddenly takes on even more serious meanings than its usual ones, since AIs cannot be ascribed responsibility, and their likely embedding in complex human collaborative clinical practice groupings and IoTs will give rise to entirely novel evolutionary problems for people especially for those who become suffering patients. It is not clear that anyone to date has ready answers to these problems – but, if we are to live up to our responsibilities as ethical human technologists, scientists, and especially as responsible practitioners of healthcare, we must try!