CC BY 4.0 · ACI open 2021; 05(02): e67-e79
DOI: 10.1055/s-0041-1735470
Review Article

A Scoping Review of Artificial Intelligence Algorithms in Clinical Decision Support Systems for Internal Medicine Subspecialties

Ploypun Narindrarangkura
1  Institute for Data Science and Informatics, University of Missouri, Columbia, United States
,
Min Soon Kim
2  Department of Health Management and Informatics, University of Missouri Institute for Data Science and Informatics, University of Missouri, Columbia, United States
,
Suzanne A. Boren
2  Department of Health Management and Informatics, University of Missouri Institute for Data Science and Informatics, University of Missouri, Columbia, United States
› Author Affiliations
Funding None.
 

Abstract

Objectives Artificial intelligence (AI)-based clinical decision support systems (CDSS) have been developed to solve medical problems and enhance health care management. We aimed to review the literature to identify trends and applications of AI algorithms in CDSS for internal medicine subspecialties.

Methods A scoping review was conducted in PubMed, IEEE Xplore, and Scopus to determine articles related to CDSS using AI algorithms that use deep learning, machine learning, and pattern recognition. This review synthesized the main purposes of CDSS, types of AI algorithms, and overall accuracy of algorithms. We searched the original research published in English between 2009 and 2019.

Results Given the volume of articles meeting inclusion criteria, the results of 218 of the 3,467 articles were analyzed and presented in this review. These 218 articles were related to AI-based CDSS for internal medicine subspecialties: neurocritical care (n = 89), cardiovascular disease (n = 79), and medical oncology (n = 50). We found that the main purposes of CDSS were prediction (48.4%) and diagnosis (47.1%). The five most common algorithms include: support vector machine (20.9%), neural network (14.6%), random forest (10.5%), deep learning (9.2%), and decision tree (8.8%). The accuracy ranges of algorithms were 61.8 to 100% in neurocritical care, 61.6 to 100% in cardiovascular disease, and 54 to 100% in medical oncology. Only 20.1% of those algorithms had an explainability of AI, which provides the results of the solution that humans can understand.

Conclusion More AI algorithms are applied in CDSS and are important in improving clinical practice. Supervised learning still accounts for a majority of AI applications in internal medicine. This study identified four potential gaps: the need for AI explainability, the lack of ubiquity of CDSS, the narrow scope of target users of CDSS, and the need for AI in health care report standards.


#

Background and Significance

Clinical Decision Support Systems

According to the Office of the National Coordinator for Health Information Technology, “clinical decision support (CDS) provides clinicians, staff, patients, or other individuals with knowledge and person-specific information, intelligently filtered or presented at appropriate times, to enhance health and health care.”[1] CDS can be used on a variety of tools and systems for clinical decision-making. Examples of CDS tools include alerts, reminders, clinical guidelines, recommendations, condition-specific order sets, data reports, documentation templates, diagnostic support, and databases.[2] CDS systems (CDSS) are computerized tools to help clinicians make clinical decisions and manage information.[3] Examples of CDSS include automated laboratory alerting systems that help the user focus on key messages such as highlighting abnormal laboratory values,[4] and pharmacy information systems that provide alerts for drug allergies or interactions.[5] Advanced CDSS delivers more accurate information to clinicians, for instance, personalized drug dosage calculators, case-based recommendations, and suggestions for laboratory testing based on diseases. Because of the rapid growth of electronic health records (EHR), CDSS has been increasingly integrated in the EHR system and the existing workflow that the clinician can efficiently receive and act on system generated recommendations.[6] To manage a large amount of clinical data and effectively transform health care systems, artificial intelligence (AI) and machine learning (ML) have been applied to computerized CDSS.[7] [8] [9]


#

Artificial Intelligence

AI was defined in 1955 by John McCarthy as “the science and engineering of making intelligent machines,” which has been designed to resolve complex challenges and hopefully someday will be as intelligent as humans.[10] The first introduction of AI in health care was in the 1970s at Stanford University, California. They developed the MYCIN rule-based system to advise physicians regarding antimicrobial therapy. The MYCIN suggested possible pathogens and recommended a dosage of antibiotics based on body weight.[11] [12]

ML is a subset of AI defined as “the field of study that gives computers the ability to learn without being explicitly programmed” by Arthur Samuel.[13] ML algorithms have four types: supervised learning, unsupervised learning, semisupervised learning, and reinforcement learning. Using data containing both inputs and target outcomes, supervised learning algorithms build a model. Conversely, unsupervised learning algorithms use data that contain only inputs to find the structure or pattern of the data. Semisupervised learning is an algorithm mixed between supervised and unsupervised learning algorithms to improve the accuracy of the model.[14] Reinforcement learning does not require input/output pairs, and it focuses on a tradeoff between exploration and exploitation.[15] ML models learn from training data to detect or predict outcomes with high accuracy. ML supports clinical work in prognosis, diagnosis, treatment, and clinical workflow.[14] For example, ML was widely used in studies predicting hospital readmission to reduce the payment for patients readmitted within 30 days of discharge. The most utilized algorithms in these studies were decision tree (DT)-based methods and support vector machine (SVM).[16]

Deep learning (DL) is a subset of ML that consists of layered sets of algorithms to progressively extract higher-level features from the raw input, inspired by neural networks (NN) of the human brain. The representation of one layer starting with the raw data input is fed and transformed into the next layer representation that enables learning highly complex functions.[17] DL works very well at discovering complex structures in high-dimensional data in medicine. For example, DL was used to identify malignancy from pictures of skin lesions,[18] detecting pneumonia from chest radiographs,[19] [20] and diagnosing diabetic retinopathy based on retinal photographs.[21] These studies demonstrate that combining advanced computational methodologies with CDSS may reduce medical errors and improve care processes.[6] [22] [23] [24]


#

Explainable Artificial Intelligence

Explainable AI (XAI) was defined by Matt Turek from the Defense Advanced Research Projects Agency XAI program. Turek claims, “XAI proposes creating a suite of ML techniques that (1) produces more explainable models while maintaining a high level of learning performance (prediction accuracy) and (2) enables human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.[25]” Many ML algorithms could not explain how and why a specific decision has been made. Thus, it raised the question: how can we make ML algorithms explainable? In 2018, the European Union General Data Protection Regulation discussed how to explain AI algorithms, and this discussion led to a debate among AI researchers regarding the “right to explanation.”[26] The right to explanation is a right to be given an explanation for the output of the algorithm. Because many AI algorithms, such as the output of the deep NN, are not easily explainable, XAI becomes more important and seeks to provide an explanation from AI algorithms. The explainability of AI could help to enhance the trust of AI-based systems from medical professionals.[27] Thus, AI-based CDSS requires not only good performance but also explainability that is trustworthy, transparent, and interpretable.[28]

To analyze the explainability of AI-based CDSS, we can consider four perspectives from a multidisciplinary approach: technological, legal, medical, and patient perspectives.[29] The technological perspective considers the explainability of the model by characteristics of the algorithm. From the legal perspective, there are three issues needed to be considered for explainability: (1) informed consent, (2) certification and approval as medical devices from the Food and Drug Administration (FDA),[30] and (3) liability. Using unexplainable AI algorithms in CDSS for medical purposes has been controlled by the FDA; hence, it would affect the trend of using XAI and AI in the future. From the medical perspective, AI-based CDSS will be considered two levels of explainability: understanding the output from the system and identifying feature importance. Last, from the patient perspective, explainability can provide personalized recommendations based on the patient's characteristics and risk factors. XAI-based CDSS could enhance patient engagement and provide an accurate risk perception.[31] [32]


#

AI in Internal Medicine

In medicine, AI is widely used to understand medical conditions, to predict diagnoses, to process extensive health data, and to aid physicians in making clinical decisions.[33] Examples of the current systems include IBM's Watson Health solutions[34] for the field of Clinical Medicine and MeVis medical solutions[35] for oncological radiology. Internal medicine is a medical specialty dealing with diagnosis, treatment, and prevention of adult diseases.[36] Internal medicine specialty has 20 subspecialties[37] and has the largest number of active physicians in the United States.[38]

For the field of neurovascular disorders, Murray et al[39] reviewed the literature on acute stroke diagnostic-focused AI from 2014 to 2019 using the search terms: “artificial intelligence” or “machine learning or deep learning” and “ischemic stroke” or “large vessel occlusion.” A total of 20 studies were identified, and the results show that random forest (RF) learning was used for the Alberta Stroke Program Early Computerized Tomography (CT) Score. In contrast, convolutional NN were used for detecting large vessel occlusions. The authors also identified platforms, including Brainomix, General Electric, iSchemaView, and Viz.ai. The authors suggested that AI improves stroke detection; however, the standardization of performance assessment is required.

For the field of cardiovascular diseases, Kilic[40] reviewed articles related to AI, ML, and cardiovascular health care that were published up to 2019. The author categorized ML algorithms into two major types, namely, supervised and unsupervised learning algorithms. Supervised learning algorithms include the Naïve Bayes theorem (NB), k-nearest neighbors, SVM, RF, extreme gradient boosting, and DT. Unsupervised learning algorithms include k-mean clustering, hierarchical clustering, principal component analysis, and singular value decomposition. The author summarized the potential application of ML in cardiovascular health care into three groups: (1) automated imaging interpretation, (2) natural language processing from EHR, and (3) predictive analytics. The author mentioned the challenges of implementing ML into clinical practice, including unexplainable results, privacy and ethical issues, validation and long-term evaluation, and the need for a large amount of data.

For an example of applied AI in the field of oncology, Jin et al[41] conducted a systematic review on AI in gastric cancer using the search terms: “artificial intelligence” and “gastric cancer,” and a total of 68 studies were included. The study reported that AI was used for omic data analyses, the identification of Helicobacter pylori infection and chronic atrophic gastritis, endoscopic diagnosis for gastric cancer, invasion depth prediction, digital pathology, bleeding detection, surgery (preoperative, intraoperative, and postoperative procedures), metastases and staging prediction, and prognosis prediction. The authors also grouped AI applications in gastric cancer, as mentioned above, into detection, treatment, and prognosis. The authors suggested that large randomized controlled trials (RCTs) are required to validate the AI models. However, it is difficult to conduct large RCTs in the rapidly changing environment of an EHR due to costs, interoperability, quality of data, and privacy and data security considerations.[42] [43] [44]

After reviewing several systematic reviews of AI in medicine, we concluded that AI applications in medicine could be grouped as prognosis/prediction, diagnosis/detection, treatment, and clinical workflow. Current ML implementation in clinical practice lacks the explainability of AI. Last, there is a need for the standardization to validate clinical performance of AI applications.


#
#

Objectives

There were many systematic review studies related to AI in Medicine. However, few studies reported the frequency and explainability of AI algorithms used in CDSS. We aimed to extract key information to identify a potential gap for further study.

In this study, we conducted a scoping review of literature in the past decade to analyze the implementation of applied AI in CDSS for subspecialties in internal medicine. Subspecialties in this study refer to the additional training to “subspecialize” in additional areas of internal medicine.[37] We aimed to answer three research questions (RQs), which are:

(RQ1) What is the frequency of applications regarding purposes of CDSS among prediction, diagnosis, treatment optimization, and clinical workflow optimization?

(RQ2) What is the frequency of applications regarding AI algorithms used in CDSS?

(RQ3) What is the overall accuracy of those algorithms?


#

Methods

Inclusion and Exclusion Criteria

Articles were included if they met the following criteria: (1) addressed CDSS using AI algorithms; (2) the AI algorithms studied include DL, ML, or automated pattern recognition; (3) they were related to the internal medicine specialty; (4) they were published between January 1, 2009 and December 31, 2019; (5) they were published in English; and (6) were original research.

We excluded articles using natural language or text processing that did not use AI algorithms. We also excluded articles proposing a new platform of CDSS without reporting results, technical reports of new algorithms without applications in medical research, and review papers.


#

Search Strategy

We searched three databases, including PubMed, IEEE Xplore, and Scopus, using the combination of search terms: “Clinical Decision Support Systems” AND (“Artificial Intelligence” OR “Deep Learning” OR “Machine Learning” OR “Automated Pattern Recognition”) and limited results from January 1, 2009 to December 31, 2019. We included automated pattern recognition in our search terms because pattern recognition is interchangeably used for ML.[45] We limited results from 2009 to 2019 because Meaningful Use introduced in 2009 in the United States promoted the electronic exchange of health information via certified EHR technology.[46] [47]


#

Study Selection

First, we reviewed the literature by screening the titles and abstracts and classified each paper as relevant, not relevant, or unclear. Second, the unclear category was revisited by reading the full-text and re-categorizing it as relevant or not relevant. Third, the full-text articles were read and key information was extracted. Those articles that met the inclusion criteria were included in the final set of articles. Last, we categorized all included articles into different internal medicine subspecialties including neurocritical care, cardiovascular disease, medical oncology, infectious disease, endocrinology, diabetes, and metabolism, critical care medicine, nephrology, gastroenterology, pulmonary disease, hematology, rheumatology, allergy and immunology, and geriatric medicine. We excluded articles related to other medical areas, including anesthesiology, dermatology, emergency medicine, obstetrics and gynecology, ophthalmology, orthopedic surgery, otolaryngology-head and neck surgery, pathology, pediatrics, physical medicine and rehabilitation, preventive medicine, psychiatry and neurocritical care, radiology, surgery, thoracic surgery, urology, orthodontics, and pharmacology from our review. Disagreements on inclusion, exclusion, and information extraction were resolved by consensus-based discussion among three authors (P.N., M.S.K. and S.A.B).


#

Data Extraction and Analysis

Key information was extracted from all articles by P.N. ([Appendix A]). The characteristics of articles included publication year, author, journal title, article title, study design (observational and experimental studies), purpose, decision, input data (a type of data, number of cases, and period of study), primary algorithms, comparison methods, balancing technique, explainability, accuracy, users, and ubiquity. The primary purpose of CDSS functions were categorized into four groups: prediction, diagnosis, treatment optimization, and clinical workflow optimization.[14] XAI was determined from the included articles. If their methodology used an AI algorithm that maintained a high level of learning performance (prediction accuracy) and enabled human users to understand, appropriately trust, and effectively manage the emerging generation of AI partners,[25] we classified it as “explainable.” Otherwise, they were categorized as “unexplainable.” In [Table 1], P.N. and M.S.K. categorized those AI algorithms into four types: supervised ML, semisupervised ML, unsupervised ML, and DL.

Table 1

Categorization of AI algorithms

Types of AI algorithms

Algorithms

Supervised machine learning

• Support vector machine (SVM)

• Decision tree (DT)

• Various types of neural network (NN)

• Regression

• Random forest (RF)

• Classifiers

• k-Nearest neighbors (kNN)

• Bayesian network (BN)

• Naïve Bayes (NB)

• Gradient boosting machine (GBM)

• Fuzzy classifier

• Genetic algorithm

Semisupervised machine learning

• Spectral regression Kernel discriminant analysis with semisupervised learning

Unsupervised machine learning

• Clustering

• Text mining

• Knowledge discovery

• Rule-based reasoning (RBR)/case-based reasoning (CBR)/guideline-based

• Various types of neural networks (NN)

Deep learning

• Deep NN (DNN)

• Convolutional neural network (CNN)

• Deep CNN

• 3D CNN

• Convolutional U-net with a two-dimensional gated recurrent NN (RNN)

• Autoencoder network

• CNN long-short-term memory (LSTM)

• Hidden Markov Model, stacked denoising autoencoder, and Statistical Language Modeling


#
#

Results

Identification of Eligible Articles

Our systematic searches identified 4,101 articles. There were 634 duplicate articles removed. The remaining 3,467 articles were screened using the inclusion criteria by titles, abstracts, and keywords. We excluded 1,973 articles based on exclusion criteria, which are articles proposing a new platform of CDSS without reporting results, technical reports of new algorithms without applications in medical research, and review papers. A full-text article assessment was conducted of 1,261 articles for eligibility. We removed 820 articles that were not related to the internal medicine specialty ([Fig. 1]). Out of 441 eligible articles, we considered the top three subspecialties composing 49.4% in internal medicine-related articles, which were neurocritical care (n = 89), cardiovascular disease (n = 79), and medical oncology (n = 50) ([Table 2]). A total of 218 articles for these three subspecialties were further analyzed, and information was extracted to answer our RQ.

Table 2

Number of studies in internal medicine subspecialties (total = 441)

Subspecialties ( n  = 13)

N

Neurocritical care

89

Cardiovascular disease

79

Medical oncology

50

Infectious disease

44

Endocrinology, diabetes, and metabolism

41

Critical care medicine

31

Nephrology

27

Gastroenterology

26

Pulmonary disease

24

Hematology

17

Rheumatology

5

Allergy and immunology

4

Geriatric medicine

4

Zoom Image
Fig. 1 Flow diagram of selecting studies for a scoping review.

#

Purposes, AI Algorithms, Explainability of AI, and Target Users of CDSS

We used [Table 3] to categorize CDSS into four groups based on the primary purpose of CDSS functions[14]: prediction, diagnosis, treatment optimization, and clinical workflow optimization. The two most common purposes of CDSS were prediction (107, 48.4%) and diagnosis (104, 47.1%). In cardiovascular disease and medical oncology, CDSS was used for the prediction of 57.5 and 58.8%, respectively. Conversely, CDSS in neurocritical care focused on diagnosis more than others (56, 62.2%).

Table 3

Characteristics of the reviewed literature

Total

(n = 218)

Neurocritical care

(n = 89)

Cardiovascular disease

(n = 79)

Medical oncology

(n = 50)

n

%

n

%

n

%

N

%

Purposes (n = 221)

 Prediction

107

48.4%

31

34.4%

46

57.5%

30

58.8%

 Diagnosis

104

47.1%

56

62.2%

30

37.5%

18

35.3%

 Treatment optimization

7

3.2%

3

3.3%

1

1.3%

3

5.9%

 Clinical workflow optimization

3

1.4%

0

0.0%

3

3.8%

0

0.0%

Algorithms (n = 239)

 SVM

50

20.9%

23

23.5%

16

19.0%

11

19.3%

 NN

35

14.6%

10

10.2%

13

15.5%

12

21.1%

 RF

25

10.5%

11

11.2%

11

13.1%

3

5.3%

 DL

22

9.2%

12

12.2%

1

1.2%

9

15.8%

 DT

21

8.8%

10

10.2%

8

9.5%

3

5.3%

 Others

17

7.1%

7

7.1%

9

10.7%

1

1.8%

 Classifiers

10

4.2%

3

3.1%

6

7.1%

1

1.8%

 RBR/CBR/guideline-based

10

4.2%

3

3.1%

2

2.4%

5

8.8%

 kNN

9

3.8%

5

5.1%

3

3.6%

1

1.8%

 BN

9

3.8%

2

2.0%

3

3.6%

4

7.0%

 Regression

8

3.3%

5

5.1%

2

2.4%

1

1.8%

 NB

8

3.3%

1

1.0%

3

3.6%

4

7.0%

 Fuzzy

6

2.5%

3

3.1%

2

2.4%

1

1.8%

 Clustering

4

1.7%

2

2.0%

2

2.4%

0

0.0%

 GBM

2

0.8%

0

0.0%

1

1.2%

1

1.8%

 Text mining

1

0.4%

1

1.0%

0

0.0%

0

0.0%

 Genetic algorithm

1

0.4%

0

0.0%

1

1.2%

0

0.0%

 Knowledge discovery

1

0.4%

0

0.0%

1

1.2%

0

0.0%

Types of AI (n = 239)

 Supervised ML

206

86.2%

78

79.6%

76

91.6%

52

91.2%

 DL

16

6.7%

12

12.2%

1

1.2%

3

5.3%

 Unsupervised ML

16

6.7%

7

7.1%

7

8.4%

2

3.5%

 Semisupervised ML

1

0.4%

1

1.0%

0

0.0%

0

0.0%

Explainable AI

 Unexplainable

174

79.5%

69

77.5%

69

87.3%

36

72.0%

 Explainable

44

20.1%

20

22.5%

10

12.7%

14

28.0%

Users (n = 225)

 Physician

218

96.9%

89

97.8%

79

95.2%

50

98.0%

 Patient

4

1.8%

2

2.2%

1

1.2%

1

2.0%

 Nurse

3

1.3%

0

0.0%

3

3.6%

0

0.0%

Abbreviations: AI, artificial intelligence; BN, Bayesian network; CBR, case-based reasoning; DL, deep learning; DT, decision tree; GBM, gradient boosting machine; kNN, k-nearest neighbors; ML, machine learning; NB, naïve Bayes theorem; NN, neural network; RBR, rule-based reasoning; RF, random forest; SVM, support vector machine.


For AI algorithms ([Table 3]), generally, the top five common algorithms were SVM (50, 20.9%), NN (35, 14.6%), RF (25, 10.5%), DL (22, 9.2%), and DT (21, 8.8%). Regarding the neurocritical care subspecialty, the top five common algorithms were SVM (23, 23.5%), DL (12, 12.2%), RF (11, 11.2%), NN (10, 10.2%), and DT (10, 10.2%). Regarding the cardiovascular disease subspecialty, the top five common algorithms were SVM (16, 19.0%), NN (13, 15.5%), RF (11, 13.2%), others (9, 10.7%), and DT (8, 9.5%). The common algorithms of the medical oncology subspecialty were NN (12, 21.1%), SVM (11, 19.3%), DL (9, 15.8%), RBR (rule-based reasoning)/CBR (case-based reasoning)/guideline-based (5, 8.8%), Bayesian network (4, 7.0%), and NB (4, 7.0%).

AI algorithms applied to CDSS for subspecialties in medicine had a wide range covering supervised ML, semisupervised ML, unsupervised ML, and DL. Although DL is a part of ML, we separated DL into a specific category because we wanted to compare the prevalence of DL applications to other types of algorithms. Of the 18 AI algorithms in [Table 3], 85.8% were supervised ML, of which 79.5% of those algorithms were unexplainable AI. The majority of CDSS were developed for physician use (218, 96.9%), followed by patient use (4, 1.8%) and nurse use (3, 1.3%).

The trend of using AI algorithms has been changing over time, as shown in [Figs. 2] and [3]. [Fig. 2] shows the number of types of AI algorithms by year, and [Fig. 3] shows the number of five common AI algorithms: SVM, NN, RF, DT, and DL by year. The use of ML in CDSS increased from two articles in 2009 to 42 articles in 2019. Specifically, DL algorithms, a newer technique, showed a sharp increase in published articles in 2018 (n = 8) and 2019 (n = 8). In the neurocritical care research area, DL has been used earlier than cardiovascular disease and medical oncology research areas for 2 years.

Zoom Image
Fig. 2 The number of types of AI algorithms by year. AI, artificial intelligence.
Zoom Image
Fig. 3 The number of AI algorithms by year. AI, artificial intelligence.

#

Answers to Research Questions

After synthesizing findings from 218 included articles, we attempted to answer our RQs as follows:

  • RQ1: What is the frequency of applications regarding purposes of CDSS among prediction, diagnosis, treatment optimization, and clinical workflow optimization?

    We grouped the purposes of CDSS into four categories: prediction, diagnosis, treatment optimization, and clinical workflow optimization. This review showed that the majority of CDSS were developed for prediction (48.4%) and diagnosis (47.1%) purposes.

  • RQ2: What is the frequency of applications regarding AI algorithms used in CDSS?

    There were wide ranges of AI algorithms used in medical research. After categorization, we found 18 different types of algorithms and the top five common algorithms among all subspecialties were SVM (20.9%), NN (14.6%), RF (10.5%), DL (9.2%), and DT (8.8%).

    As the breadth of these data demonstrates, each model has its pros and cons and is potentially suited for different subspecialties ([Table 4]). From our results, we found that SVM and NN were common among those three subspecialties. The reason could be that SVM can handle multiple-class classification and small datasets. Moreover, SVM and NN are easier to use for prediction or classification and more stable than DT. However, the results from SVM and NN can be hard to explain. We also found that DL is more prevalent in neurocritical care and medical oncology than in cardiovascular disease.

    After further examination of the data modalities used in the original studies, we found that, in neurocritical care, several frequently applied data types are suitable for using DL, such as intracranial electroencephalogram,[48] facial video clips,[49] electroencephalogram,[50] [51] [52] [53] [54] [55] [56] [57] [58] and magnetic resonance imaging (MRI).[59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] Similarly, in medical oncology, the DL method is mostly applied to the image data.[71] [72] This is reasonable as images are used more in diagnosis in these two subspecialties than cardiovascular disease, and DL is suited to the analysis of image data, such as MRI, CT, positron emission tomography scans, and ultrasound images.

  • RQ3: What is the overall accuracy of those algorithms?

    Accuracy is the percentage of correct predictions for the input data and is calculated by the number of correct predictions divided by the total number of predictions made. In a simple way, accuracy is the percentage of our model got right.[73] The accuracy of CDSS should be tested because inaccurate recommendations can endanger the safety or well-being of patients.[9] It is challenging to report the average accuracy of AI algorithms because various metrics have been used to measure accuracy in these articles. For the articles reporting accuracy scores, we found that the accuracy ranges of AI algorithms in neurocritical care, cardiovascular disease, medical oncology were 61.8 to 100%, 61.6 to 100%, and 54 to 100%, respectively.

    Because of the inconsistency in reporting results of individual articles, it is particularly challenging to synthesize and report the results from included articles. To address this issue, Hernandez-Boussard et al[74] presented MINimum Information for Medical AI Reporting or MINIMAR to standardize the report on AI in health care. The standard report should satisfy four essential requirements: (1) study population and setting, (2) patient demographic characteristics, (3) model architecture, and (4) model evaluation. The study population and setting include population, study setting, data source, and cohort selection. The patient demographic characteristics are age, sex, race, ethnicity, and socioeconomic status. For the model architecture, researchers should report model output, target user, data splitting, gold standard, model task, model architecture, features, and missingness. The report should include optimization, internal model validation, external validation, and transparency for the model evaluation. This standard would help provide an accurate and responsible report on AI in health care.

Table 4

Advantages and disadvantages of AI algorithms

Algorithms

Advantages

Disadvantages

Support vector machine[102] [103] [104] [105] [106] [107]

• Binary or multiple-class classification

• Do not need a large dataset

• Difficult to pick kernel function/parameters

• Difficult to explain

• Long-time training on a large dataset

Neural network[108] [109] [110]

• Able to do complex classification

• Difficult to explain

• Difficult to tune parameters

Random forest[111] [112] [113] [114]

• Able to deal with missing values

• Able to deal with high-dimension large dataset

• Difficult to explain

• Prone to have overfitting problem

Decision tree[115] [116] [117]

• Able to be explained

• Able to deal with complex data

• Unstable

• Accuracy not high

Deep learning[17] [118] [119] [120]

• Able to do complex classification

• Could be expensive (graphics processing unit needed)

• Difficult to explain

• Need large dataset


#
#

Discussion

We conducted a scoping review to find evidence of applied AI algorithms in CDSS for internal medicine subspecialties. Accordingly, our study found that neurocritical care, cardiovascular disease, and medical oncology subspecialties had the high volume of applied AI algorithms in CDSS studies, approximately 49%. According to the World Health Organization and the Centers for Disease Control and Prevention, cardiovascular diseases were the first leading cause of death, and neurological disorders were the second leading cause of death worldwide.[75] [76] [77] Cancer is a major health problem worldwide and was the second leading cause of death in the United States in 2019.[78] This review provided significant value regarding CDSS using AI algorithms in internal medicine and globally major health problems. The volume of applied AI algorithms to solve medical problems has continuously increased from 2009 to 2019, with a substantial change in 2018 and 2019. We also observed a significantly growing number of articles involving DL from 2016 to 2019.

Explainability of AI Algorithms

This review shows that most articles have used unexplainable algorithms (79.5%). The use of unexplainable AI models has been debated and discussed in many articles, with an ongoing controversy in current medical practices. We believe that in the future, researchers should move forward applying XAI algorithms, which are AI algorithms that provide results that are understood by human experts.[25]

AI explainability is examined primarily from a clinical point of view, highlighting the ability of humans to understand which clinical characteristics drive the prediction. This is important, as the main objective of clinical predictive modeling is the development of CDSS, assisting health professionals in their clinical decision-making, predicting diagnoses, risks, and results.[27] [79] It is important to keep in mind that the requirements for CDSS go far beyond the performance of the model.[80] It is established that CDSS for the clinical environment needs to exhibit proven safety and accuracy.[80] The explainability of AI systems is crucial to understand why they do what they do, but more importantly, to understand why and when they may not do what is planned. This transparency is important in light of the growing awareness of potential biases in the models used for health discrimination. An XAI system is essential to provide: a safe interpretation and verification of the results acquired during development; better evaluation of the safety and justice of medical products, especially concerning bias, during the regulatory process; interpretation supported by domain knowledge leading to increased confidence on the part of doctors, other health professionals, and patients. The explainability of AI can help to increase the confidence of medical professionals in future AI systems.


#

Ubiquity and Usability

We identified information on developed CDSS for ubiquity, i.e., if the CDSS are made to appear anytime and everywhere. Some articles had developed ubiquity, such as software, Web-based tools, and mobile apps. The ubiquity includes neuroQWERTY platform,[81] Heart Failure Manager tool,[82] Chest Pain Rule Out (CPRO) Calculator,[83] the HEARTFAID platform,[84] PaDEL-Survival,[85] OncoMortality,[86] PrediWeb,[87] and The-Optimal-Lymph-Flow (TOLF).[88] Most of the included articles did not report about model applications.

In a CDSS, the outcome of the system can be related to the user interface directly. A successful CDSS should offer an efficient user interface to clinicians to get the most proper consultation results. Miller et al[89] described simplification as including only the elements that are most important for communication. Use of consistent terminology, concise and unambiguous language, and effective visualization improved usability and reduced information density. To improve usability, it is suggested to consider using appropriate font sizes, using meaningful colors, ensuring acceptable contrast between the text and background, and making the icons bold or larger. Space-filling techniques help to maximize the amount of information that can be displayed in the available display space. Visibility factors consider human factors and cognitive computing. A user-centered design process also should be considered during the CDSS development. The user-centered design aims to create the system based on user characteristics using interdisciplinary approaches of cognitive science, psychology, and computer science.[90] [91] The user-centered design helps identify the potential deficiencies of CDSS, such as substantial variability in the usability, efficacy, and safety of CDSS.[92] [93] [94]


#

Study Limitations

Our study has several limitations. First, we conducted a scoping review, which did not require an assessment of methodological limitations or risk of bias of the evidence[95]; however, we collected study design ([Appendix A]), which can provide the level of evidence of individual studies. Second, we excluded non-English papers, which may constitute a selection bias. Last, we limited the year of publications based on EHR implementation in the United States and associated applications of AI-based CDSS, which may lead to publication bias. However, we believe that the findings of our review were able to answer our RQ.


#
#

Conclusion

With the continued advancement of medical techniques and devices, the size, variety, and complexity of data also continue to increase. Many ML and data mining methods have been used in the medical field to help with disease diagnosis, prediction, and treatment optimization. This demonstrates that AI can provide more accurate diagnostic results. We identified four potential research gaps to fill in from this study. First, we found that only 44 articles (20.1%) of the included articles have used XAI algorithms resulting in distrust from clinicians because of the lack of effectiveness and learning performance. We suggest future CDSS should increase the utilization of XAI algorithms, which can help to enhance trust and confidence in using the CDSS among clinicians. Second, we found that there was a lack of ubiquity among the reviewed articles. The CDSS should be available for users anytime and anywhere to make clinical decisions at the point of care; however, only 21 articles (9.6%) developed platforms (i.e., software, web-based tools, and mobile apps) that clinicians and patients can access. Most of the articles did not report the platform development or implementation. We suggest future CDSS should consider not only the model performance but also ubiquity improvement. The ubiquity will increase accessibility for clinicians and patients and lead to opportune use of CDSS in clinical practice. Third, the majority of CDSS were developed for physician users (96.9%). Developers should consider expanding the scope of target users and enhancing engagement in shared decision-making among health care providers and patients to achieve the delivery of patient-centered care. Last, we observed a lack of standardized reporting structure in AI-based CDSS that resulted in inconsistent data extraction. The reviewed articles did not follow the MINIMAR standards when they reported information and failed to provide an accurate, unbiased, and meaningful report. We suggest future articles related to AI in health care should report information following the MINIMAR standards.

Although there are many studies showing the success of using CDSS in health care management, implementation is a significant challenge because of unreliability and inability to exchange EHR data between systems, unfriendly user interfaces, limited choices of implementation and workflow, and technical issues.[96] [97] Moreover, in the real world, EHR data can be inaccurate, unreliable, transformed, and insufficient.[98] [99] [100] [101] Hence, the quality of data is an important challenge for applied AI in medicine.


#

Clinical Relevance Statement

This scoping review showed the trends of utilizing AI algorithms in CDSS for subspecialties in internal medicine between 2009 and 2019. The most frequent numbers of articles related to CDSS using AI algorithms among internal medicine subspecialties were neurocritical care, cardiovascular disease, and medical oncology. This review showed a substantial change in utilizing DL in published articles in 2018 and 2019. This review indicated four potential gaps of CDSS development: the need for AI explainability, the lack of ubiquity of CDSS, the narrow scope of target users, and the need for AI in health care report standards.


#

[Appendix A: Characteristics of the reviewed literature]


#
#

Conflict of interest

None declared.

Author Contributions

S.A.B. and M.S.K contributed the study design, critical revision of this article, and final approval of the version to be published. P.N. searched the literature, synthesized included studies, and drafted the article.


Supplementary Material


Address for correspondence

Suzanne A. Boren, PhD, MHA
Department of Health Management and Informatics, University of Missouri School of Medicine
CE707 Clinical Support & Education Building, DC006.00, Columbia, MO 65212
United States   

Publication History

Received: 28 October 2020

Accepted: 13 July 2021

Publication Date:
14 September 2021 (online)

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 Flow diagram of selecting studies for a scoping review.
Zoom Image
Fig. 2 The number of types of AI algorithms by year. AI, artificial intelligence.
Zoom Image
Fig. 3 The number of AI algorithms by year. AI, artificial intelligence.