Abstract
Background Differential diagnosis in radiology is a critical aspect of clinical decision-making.
Radiologists in the early stages may find difficulties in listing the differential
diagnosis from image patterns. In this context, the emergence of large language models
(LLMs) has introduced new opportunities as these models have the capacity to access
and contextualize extensive information from text-based input.
Objective The objective of this study was to explore the utility of four LLMs—ChatGPT3.5, Google
Bard, Microsoft Bing, and Perplexity—in providing most important differential diagnoses
of cardiovascular and thoracic imaging patterns.
Methods We selected 15 unique cardiovascular (n = 5) and thoracic (n = 10) imaging patterns. We asked each model to generate top 5 most important differential
diagnoses for every pattern. Concurrently, a panel of two cardiothoracic radiologists
independently identified top 5 differentials for each case and came to consensus when
discrepancies occurred. We checked the concordance and acceptance of LLM-generated
differentials with the consensus differential diagnosis. Categorical variables were
compared by binomial, chi-squared, or Fisher's exact test.
Results A total of 15 cases with five differentials generated a total of 75 items to analyze.
The highest level of concordance was observed for diagnoses provided by Perplexity
(66.67%), followed by ChatGPT (65.33%) and Bing (62.67%). The lowest score was for
Bard with 45.33% of concordance with expert consensus. The acceptance rate was highest
for Perplexity (90.67%), followed by Bing (89.33%) and ChatGPT (85.33%). The lowest
acceptance rate was for Bard (69.33%).
Conclusion Four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—generated differential
diagnoses had high level of acceptance but relatively lower concordance. There were
significant differences in acceptance and concordance among the LLMs. Hence, it is
important to carefully select the suitable model for usage in patient care or in medical
education.
Keywords
artificial intelligence - cardiothoracic - ChatGPT - Google Bard - Microsoft Bing
- perplexity - differential diagnosis - radiologists