Comparing the performances of a fifty-four-year-old computer-based consultation to ChatGPT-4o

Elvan Burak Verdi; Oguz Akbilgic

doi:10.1055/a-2628-8408

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Download PDF

Appl Clin Inform
DOI: 10.1055/a-2628-8408

Research Article

Comparing the performances of a fifty-four-year-old computer-based consultation to ChatGPT-4o

Authors

Elvan Burak Verdi

¹Chest Diseases, Ministry of Health Siirt Education and Research Hospital, Siirt, Turkey (Ringgold ID: RIN588473)
Oguz Akbilgic

²Cardiovascular Medicine, Wake Forest University School of Medicine, Winston-Salem, United States (Ringgold ID: RIN12279)

Further Information

Also available at

PDF Download

Objective: To evaluate and compare the diagnostic responses generated by two artificial intelligence models developed 54 years apart and to encourage physicians to explore the use of large language models (LLMs) like GPT-4o in clinical practice. Methods: A clinical case of metabolic acidosis was presented to GPT-4o, and the model’s diagnostic reasoning, data interpretation, and management recommendations were recorded. These outputs were then compared to the responses from Schwartz’s 1970 AI model built with a decision-tree algorithm using Conversational Algebraic Language (CAL). Both models were given the same patient data to ensure a fair comparison. Results: GPT-4o generated an advanced analysis of the patient’s acid-base disturbance, correctly identifying likely causes and suggesting relevant diagnostic tests and treatments. It provided a detailed, narrative explanation of the metabolic acidosis. The 1970 CAL model, while correctly recognizing the metabolic acidosis and flagging implausible inputs, was constrained by its rule-based design. CAL offered only basic stepwise guidance and required sequential prompts for each data point, reflecting a limited capacity to handle complex or unanticipated information. GPT-4o, by contrast, integrated the data more holistically, although it occasionally ventured beyond the provided information. Conclusion: This comparison illustrates substantial advances in AI capabilities over five decades. GPT-4o’s performance demonstrates the transformative potential of modern LLMs in clinical decision-making, showcasing abilities to synthesize complex data and assist diagnosis without specialized training, yet necessitating further validation, rigorous clinical trials, and adaptation to clinical contexts. Although innovative for its era and offering certain advantages over GPT-4o, the rule-based CAL system had technical limitations. Rather than viewing one as simply “better,” this study provides perspective on how far AI in medicine has progressed while acknowledging that current AI tools remain supplements to—not replacements for—physician judgment.

Publication History

Received: 20 December 2024

Accepted after revision: 05 June 2025

Accepted Manuscript online:
06 June 2025

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

Related Journals

Subscribe to RSS

Share / Bookmark

Comparing the performances of a fifty-four-year-old computer-based consultation to ChatGPT-4o

Authors

Publication History