CC BY-NC-ND 4.0 · Yearb Med Inform 2019; 28(01): 027-034
DOI: 10.1055/s-0039-1677899
Special Section: Artificial Intelligence in Health: New Opportunities, Challenges, and Practical Implications
Working Group Contributions
Georg Thieme Verlag KG Stuttgart

# The Interplay of Knowledge Representation with Various Fields of Artificial Intelligence in Medicine

A Contribution from the IMIA Working Group on Language and Meaning in BioMedicine› Author Affiliations
Further Information

### Correspondence to

Dr. László Balkányi
Gyula utca 2
H-1016 Budapest
Hungary

### Publication History

Publication Date:
25 April 2019 (online)

### Summary

Introduction: Artificial intelligence (AI) is widespread in many areas, including medicine. However, it is unclear what exactly AI encompasses. This paper aims to provide an improved understanding of medical AI and its constituent fields, and their interplay with knowledge representation (KR).

Methods: We followed a Wittgensteinian approach (“meaning by usage”) applied to content metadata labels, using the Medical Subject Headings (MeSH) thesaurus to classify the field. To understand and characterize medical AI and the role of KR, we analyzed: (1) the proportion of papers in MEDLINE related to KR and various AI fields; (2) the interplay among KR and AI fields and overlaps among the AI fields; (3) interconnectedness of fields; and (4) phrase frequency and collocation based on a corpus of abstracts.

Results: Data from over eighty thousand papers showed a steep, six-fold surge in the last 30 years. This growth happened in an escalating and cascading way. A corpus of 246,308 total words containing 21,842 unique words showed several hundred occurrences of notions such as robotics, fuzzy logic, neural networks, machine learning and expert systems in the phrase frequency analysis. Collocation analysis shows that fuzzy logic seems to be the most often collocated notion. Neural networks and machine learning are also used in the conceptual neighborhood of KR. Robotics is more isolated.

Conclusions: Authors note an escalation of published AI studies in medicine. Knowledge representation is one of the smaller areas, but also the most interconnected, and provides a common cognitive layer for other areas.

#

### 1 Introduction

Artificial intelligence (AI) is becoming increasingly important and its impact is manifold - at least potentially -, but its exact scope is unclear. In this paper we aim to increase understanding of the structure of medical AI as a field of applied science [1], by investigating the interaction of its constituent fields. The interplay among various fields is studied specifically from the point of view of knowledge representation [1]. Our first objective is to shed light on what exactly AI encompasses, as seen in the medical research literature. The second objective is to analyze how the notion of knowledge representation (KR) interacts with various fields of AI, i.e., how KR contributes to other fields of AI, and how these contribute to KR. The analysis of the relationships of these fields helps to understand the trends.

#

### 2 Background

When addressing AI from a knowledge-representation perspective, an obvious first task is to assess what exactly AI encompasses. Literature does not provide a widely accepted classification or a structural model of (medical) AI and its constituents. Many definitions and descriptions of AI exist, well summarized for example by [2], but authors think these might not add much to the concept of medical AI for an (already) interested reader. Similarly, there is no single authoritative reference classification of the constituent fields of AI. Library science tools, including catalogue classification systems like DDC (Dewey Decimal Classification), UDC (Universal Decimal Classification), LCC (The Library of Congress Classification) [3], are of no avail as they don’t provide subcategories. Within the medical and health domain of AI, the hierarchy of Medical Subject Headings (MeSH) provides a good, pragmatic classification of medical AI-related research papers [4], which is shown in [Figure 1]. The definitions of MeSH terms are given in [Table 1].

Table 1

### MeSH terms and definitions for AI and related fields [4]

MeSH terms

MeSH Scope Note - Definition

Year (established in MeSH)

Artificial Intelligence

Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING, visual perception, mathematical computing, reasoning, problem-solving, decision-making ; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language.

1986

Computer Heuristics

Trial-and-error methods of problem-solving used when an algorithmic approach is impractical.

2016

Expert Systems

Computer programs based on knowledge developed from consultation with experts on a problem, and the processing and/or formalizing of this knowledge using these programs in such a manner that the problems may be solved.

1987

Fuzzy Logic

Approximate, quantitative reasoning that is concerned with the linguistic ambiguity which exists in natural or synthetic language. At its core are variables such as good, bad, and young as well as modifiers such as more, less, and very. These ordinary terms represent fuzzy sets in a particular problem. Fuzzy logic plays a key role in many medical expert systems.

1993

Knowledge Bases

Collections of facts, assumptions, beliefs, and heuristics that are used in combination with databases to achieve desired results, such as a diagnosis, an interpretation, or a solution to a problem.

2006

Biological Ontologies

Structured vocabularies describing concepts from the fields of biology and relationships between concepts.

2014

Machine Learning

A type of ARTIFICIAL INTELLIGENCE that enables COMPUTERS to independently initiate and execute LEARNING when exposed to new data.

2016

Natural Language Processing

Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage.

1991

Neural Networks (Computer)

A computer architecture, implementable in either hardware or software, modeled after biological neural networks.

... .computerized neural networks,..., consist of neuron-like units. A homogeneous group of units makes up a layer They are adaptive, performing tasks by example, and thus are better for decision-making than are linear learning machines or cluster analysis.

1992

Robotics

The application of electronic, computerized, control systems to mechanical devices designed to perform human functions. Formerly restricted to industry, but nowadays applied to artificial organs controlled by bionic (bioelectronic) devices, like automated insulin pumps and other prostheses.

1987

#

### 3 Materials and Methods

To achieve our goals we use descriptive metadata, i.e. keywords assigned by authors of papers published in this field and the Medical Subject Headings (MeSH) indexing terms assigned by MEDLINE indexers. We used MEDLINE to retrieve papers, as it consistently specifies MeSH headings. Our method followed these steps:

• Analysis of the proportion of papers in MEDLINE characterized by relevant content metadata for various fields of medical AI. We used PubMed-by-Year [5] to investigate publication frequencies of various medical AI fields over time. This tool is used to visualize the relative proportion of cited publications, tagged by the relevant MeSH index terms. It compares the results for each year to the database as a whole. By entering multiple searches, we may have the results displayed in parallel.

• Determining the interplay between KR and the respective fields of AI by checking co-occurrence of content metadata, as well as detecting interplay among various AI fields themselves. PubVenn [6] was used in this step, a tool that enables PubMed to convert search terms into codified search. As content metadata, i.e., as search terms, we combined our keywords with a series of MeSH terms. PubVenn produces a Venn diagram showing interaction between various AI fields, and provides extraction of the numbers and bibliographic data of citations in the overlapping areas of the Venn diagram.

• Visualization of the interconnectedness, using NodeXL [7] an open-source template for Microsoft® Excel® that makes network graphs.

• Analysis of a limited corpus of medical AI abstracts. A corpus containing the abstracts of the most relevant first thousand papers was established. Relevance was decided according to PubMed “Best Match” ordering. All the abstracts of papers, having MeSH classified “artificial intelligence” keywords, AND the ones, keyworded by authors as “knowledge representation”, were added to the corpus. We used ANTConc [8] to perform phrase frequency and collocation analysis of content metadata labels used as notion labels (words) in the corpus text for better understanding the interplay among fields in general, and between fields and KR.

All search results are based on the numbers extracted from a snapshot of a search performed in October 2018. In order to retrieve papers including knowledge representation and the respective fields in AI, we used a simple search construct: pairs of authors’ keyword ‘knowledge representation’ and the labels of MeSH index terms pertaining to AI, as shown in [Figure 1]. In the same way, we retrieved the MeSH index term pairs, using the same simple search construct e.g., “Biological Ontologies”[All] AND “Natural Language Processing”[All]. As keywords are limited, and may not address all relevant aspects of indexed papers, we exploited text mining to gain insight into the frequencies of phrases that relate to the content descriptive metadata labels. Text mining on the abstracts of the papers followed a Wittgensteinian approach: interpreting the “meaning by usage” - the usage of the content metadata notions as words, referring to AI. [9]. We analyzed occurrences (phrase frequencies and collocation) of content descriptive metadata element labels as words used in the text of papers. We think that this work would provide a deeper understanding of the underlying conceptual structure of the field in research. To this end, a corpus was created consisting of the title, keywords, and abstract of the first 1,000 articles according to PubMed “Best Match” order.

#

### 4 Results

[Figure 2] shows the growth of various MeSH-defined AI fields as proportions of MEDLINE-indexed publications. The data regarding “knowledge representation” in MEDLINE were collected with the same search query formalism as the queries for those AI fields for which MeSH terms exist. In the last thirty years, research intensified significantly and the growth started in the eighties. The ratio of AI-related research output to all MEDLINE-indexed publications is presently about six times as much as it was at the beginning of the eighties. Some areas like (artificial) neural networks started to grow almost exponentially in the nineties - seemingly levelling out over time, after the year 2000. Other areas like machine learning show very steep growth in the last decade. There is a steady growth in the area of expert systems. The area of knowledge bases (KB) research started to grow with more research on biological ontologies understood by MeSH as a subcategory of KBs. More details and an actualized version with latest data are here: https://goo.gl/j4fvi4

The changes (and more specifically their relations to knowledge representation) are further analyzed in this paper by text mining the relevant literature. The results are presented below in two steps.

Step 1: Investigation of the overall interaction among KR and various fields of AI in biomedical literature

[Table 2] shows the extracted data. As described in the Methods section, the first level of the MeSH hierarchy classification is used together with ‘biological ontologies’ - even though ‘biological ontologies’ falls under the MeSH hierarchy ‘knowledge bases’. This is further addressed in the discussion section.

Table 2

### Number of publications indexed in MEDLINE with MeSH AI sub-areas, and their interplay (snapshot taken on 1 Oct. 2018).

The red and the blue numbers show the two areas (NeurNet and MachL) mostly cross-cited with all others. Sums of cross-citations and standard deviations (SD) are calculated from the vertical and horizontal numbers (nine data elements - see as examples the red and blue numbers) for each area. In the case of ‘heuristics’, most of the data elements are zero, that is why calculating a standard deviation is not relevant. The standard deviation of these number series indicates how evenly a certain field is connected to others. Obviously, a higher SD means less uniform distribution.

For further visual analysis, overlapping citations among various AI areas are shown as a network diagram in [Figure 3]. Nodes represent AI areas by their MeSH designations. Edges represent the overlaps, the cross citations among the nodes. In the depicted network, the nodes are proportionally sized to the number of cited literature areas. The width and the style of the edges correspond to the overlap among them. Widths of edges grow with the magnitude of the overlap. This network visualization helps to see the interconnectedness between the areas and the role of Knowledge Representation in this interdisciplinary arena.

Step 2: Phrase frequency and collocation analysis of extracted abstracts

The above studied citation data cover over eighty thousand citations. A further, in-depth look a limited corpus containing the abstracts of the most relevant first thousand papers was established. This corpus had 246,308 total words, of which 21,842 are unique word forms. A simple phrase frequency analysis [8] shows that the following five AI fields occur among the most frequent terms in the corpus:

These frequencies show the most researched areas of AI, however they do not shed light to their interaction with the specific aspect of language and meaning, classically discussed as knowledge representation’. The collocation of AI fields was measured in the same corpus as the phrase frequency, both the left and right window spans were set to the maximum of 20 terms distance. In an earlier paper [10], authors realized that “... the central role of the term “concept” has been gradually abandoned ….”. The notion of ‘concept’ was a term central to what was called the field of research in ‘knowledge representation’. Therefore, in order to analyze the current corpus on AI, in addition to the notion of ‘knowledge representation’, the notions ‘language’ & ‘meaning’ were also brought to the collocation study. For the four most studied areas, collocation data found in the corpus are shown in [Table 4].

#

### 5 Discussion

#### Principal Findings

[Figure 1] shows that over time the various fields related to medical AI follow a cascading and explicitly escalating evolution. ‘Expert systems’ studied in the eighties were followed by ‘computer neural networks’ being in the lead in the nineties and the beginning of the twenty first century. This was followed by even more research focusing on ‘robotics’ and currently on ‘machine learning’. At the same time, research goes on steadily in the other depicted fields. The cascade character might show us how new fields, or new names for old fields, take on and might also incorporate the results of previous areas. However, it is not trivial to see if ‘machine learning’ will also take on the “cube root” function characteristics of other research fields, levelling out over time. [Table 2] and [Figure 3] show that although the research in medical AI has branched to a broad spectrum of fields, they are well interconnected. At the same time the interconnectedness varies greatly. ‘Computer heuristics’ and ‘biological ontologies’ are somewhat less interconnected to other fields, ‘machine learning’ and ‘computer-based neural networks’ are the most interconnected fields with all others. The term “knowledge representation” in the MeSH thesaurus itself is not part of an AI field, but is used in three entry terms for AI: Knowledge Representation (Computer) Knowledge Representations (Computer), Representation, Knowledge (Computer). [Table 3] shows that the four areas ‘robotic’, ‘fuzzy logic’, ‘neural networks’, and ‘machine learning’ seem to be by far the most mentioned researched areas, while ‘expert systems’, although above the limit of 50 citations, scores well below. [Table 4] tells us that ‘fuzzy logic’ seems to be the most collocated notion to the world of ‘knowledge representation’, ‘meaning’, and ‘language’. This shows some advantage of the fuzzy approach to represent and to interpret medical knowledge. ‘Neural networks’ and ‘machine learning’ are also used in the conceptual neighborhood of knowledge representation. At the same time ‘robotics’, while an important area in AI, seems to be somewhat isolated from the KR world. These results from text mining show that the various AI fields are well interconnected. It is interesting to see that the lowest standard deviation (SD) of cross citations to different areas occurs for our historically central concept ‘knowledge representation’. The relatively lowest SD shows that KR is the most “evenly” referred ‘notion’ till today. This finding provides a quantitative indicator suggesting that studying KR was (and is) at the origin of the wide spreading and branching fields of AI research. We will briefly highlight three interactions.

Table 3

AI fields

Phrase examples

Count

Robotics

the robot

robotic surgery

of robotic

a robot

sum:

118

84

79

53

334

Fuzzy logic

neuro fuzzy

fuzzy neural

fuzzy inference

fuzzy logic

sum:

83

81

60

80

304

Neural networks

neural networks

284

Machine learning

machine learning

267

Expert systems

expert system

62

Table 4

### Collocation of AI-field-related phrases.

KR notions/AI Areas

Knowledge representation

Language

Meaning

Total

Robotics

Fuzzy logic

Neural networks

Machine learning

6

79

24

15

3

9

7

9

0

5

2

0

9

93

33

24

#### Interaction between Knowledge Representation and Robotics

Knowledge representation plays a role in robotics, for example for categorizing emotions [11], learning cognitive robots to count [12], representing and formalizing knowledge about care [13]. These examples show how knowledge representation can be an integral part of improving the functioning of robots. It apparently is yet too early to exploit the cognitive capacities of robots to contribute to knowledge representation, as no literature was found on this topic.

#

#### Interaction between Knowledge Representation and Machine Learning

Interaction between knowledge representation and machine learning is yet limited, but needed. An early acknowledgement of this need, specifically for diagnostic image interpretation, is found in [14]. Already in 1988, it was stated that “Diagnostic image interpretation with learning capability demands a full model of the human expert’s competence, including a considerable variety of knowledge representation schemes and inference strategies, coordinated by a meta-process controller.” A recent approach is to combine graph data (represented in Resource Description Framework and Ontology Web Language) with neural networks to generate embeddings of nodes [15]. This combination results in embeddings that contain both explicit and implicit information. Machine learning can contribute to knowledge representation, e.g., by abstract feature selection, which has been applied for automated phenotyping in [16]. Finally, we notice that natural language processing is among the domains to which machine learning and knowledge representation are applied. For example, MedTAS/P combines these three areas, as described in [17].

#

#### Interaction between Knowledge Representation and Fuzzy Logic

Not surprisingly, most of these overlapping studies focus on the fuzzy nature of our limited knowledge in explaining and understanding particular diseases (e.g. Economou et al. [18]) in cardiology or in the field of oncology (see D’Aquin et al. 2004 [19]). However, interesting studies compare the “fuzzy” thinking with different approaches, where the “fuzziness” seems to be a connecting notion between the worlds of algorithmic and other approaches interpreting medical data, e.g. Douali et al., in 2014 [20], on fuzzy cognitive maps and Bayesian networks, and Kwiatkowska et al., in 2007 [21], on creating prediction rules using typicality measures. Another typical area for overlapping studies is the high level interpretation of medical knowledge, e.g., Bellamy, in 1997 [22], on “Medical diagnosis, diagnostic spaces, and fuzzy systems” and the work of Boegl et al., in 2004 [23], on knowledge acquisition in a fuzzy knowledge representation framework. Summing up this interaction of these two fields is quite broad and covers many different areas of medical information science.

#
#

#### Limitations

Various widely divergent approaches involving, among others, fuzzy set theory [24], Bayesian networks [25], and artificial neural networks [26] [27] have been applied to intelligent computing systems in healthcare. Papers concerning AI in the medical domain appear in many literature collections and research events, e.g., events by IEEE - Institute of Electrical and Electronics Engineers, AAAI - Conference on Artificial Intelligence, MLDM - International Conference on Machine Learning and Data Mining, or Intelligent Systems Conference, which may not be indexed in MEDLINE. However, we consider MEDLINE itself as a large enough “sample” of medical AI research to represent the fields and their interplay, so that any limitations of using only MEDLINE will not impact the results.

As mentioned, we found over 80.000 papers that were used in the field interplay analysis. However the more detailed text mining of the corpus had to be limited to the first thousand “best match” papers because of the corpus size limitations of the analytic tools. Having about 250,000 total words and over 20,000 unique word forms, size seems adequate for getting meaningful results for the phrase frequency and the collocation analysis that followed.

For the phrase frequency study, we limited the analysis to phrases occurring at least 50 times. While the tool calculated all phrase frequencies, our opinion is that there has to be a limit in order to judge that a phrase occurs sufficiently frequently in the corpus to demonstrate interest in a research field. While the limit of 50 was chosen in a somewhat arbitrary way, we think there is not much difference among little-mentioned research fields, but there is a clear difference with the leading fields that occur several hundred times. The tables and figures presented in results give some insight in what encompasses AI in the health domain and how the various areas of AI research interact.

#

#### Definitions of AI in Related Literature

There is no common agreement on what exactly AI encompasses; thus AI can be considered a “fuzzy” term. In the field of medicine, MeSH provides a good basis for specifying the subdomains of (medical) AI. However, MeSH includes “knowledge representation” as an entry term for “artificial intelligence”, while “knowledge bases” is a subcategory of AI. Outside of the medical domain, attempts to define AI and its field have led to more philosophical answers. Larry Tesler, quoted in [28], provides a definition that may not be helpful in itself, but does highlight the hype that periodically surrounds AI, stating that “Artificial Intelligence is whatever hasn’t been done yet”. The common aspect of AI is that of computers mimicking intelligent human behavior. Whereas this is sometimes simplified as “thinking machines”, this was demonstrated being an inadequate metaphor by Edsger Dijkstra’s quote “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” [29].

#
#

### 6 Conclusion

The results of our analysis revealed that AI research in medicine occurs in a cascading and escalating way. While neural networks, robotics, and machine learning are the research areas with the largest number of indexed publications, they show the lowest relative interplay with other areas, whereas knowledge representation publications, having one of the smallest numbers of indexed publications, expose the highest interplay of around 45%. This supports the idea that the notion of knowledge representation might play both a historical and foundational role in the various areas, providing a common cognitive layer, a still needed context, even for domains such as machine learning, neural nets, fuzzy logic, and robotics.

#
#

1 Authors, chairing the IMIA WG 6, currently called “Language and Meaning in Biomedicine”, formerly “Medical Concept Representation” are continuing the tradition of this WG time to time reaching out for a cross-disciplinary overview with other fields of biomedical information science - in this case with AI. See our WG site for more details: https://imiawg6lamb.wordpress.com/.

### Correspondence to

Dr. László Balkányi
Gyula utca 2
H-1016 Budapest
Hungary