Subscribe to RSS

DOI: 10.1055/s-0044-1792040
Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions
Funding None.
Abstract
Background Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging.
Objective This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination.
Materials and Methods We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: “Identify the structure indicated by the arrow(s).” Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question.
Results Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings.
Conclusion Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.
Keywords
artificial intelligence - ChatGPT-4 - large language model - radiology - FRCR - anatomy - fellowshipData Availability Statement
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
Publication History
Article published online:
04 November 2024
© 2024. Indian Radiological Association. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India
-
References
- 1 Rathan R, Hamdy H, Kassab SE, Salama MNF, Sreejith A, Gopakumar A. Implications of introducing case based radiological images in anatomy on teaching, learning and assessment of medical students: a mixed-methods study. BMC Med Educ 2022; 22 (01) 723
- 2 Pathiraja F, Little D, Denison AR. Are radiologists the contemporary anatomists?. Clin Radiol 2014; 69 (05) 458-461
- 3 Open AI. GPT-4 is OpenAI's most advanced system, producing safer and more useful responses. Accessed July 14, 2024 at: https://openai.com/index/gpt-4/
- 4 Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023; 388 (13) 1233-1239
- 5 Wójcik S, Rulkiewicz A, Pruszczyk P, Lisik W, Poboży M, Domienik-Karłowicz J. Beyond ChatGPT: what does GPT-4 add to healthcare? The dawn of a new era. Cardiol J 2023; 30 (06) 1018-1025
- 6 Mu Y, He D. The potential applications and challenges of ChatGPT in the medical field. Int J Gen Med 2024; 17: 817-826
- 7 Montazeri M, Galavi Z, Ahmadian L. What are the applications of ChatGPT in healthcare: gain or loss?. Health Sci Rep 2024; 7 (02) e1878
- 8 Abd-Alrazaq A, AlSaad R, Alhuwail D. et al. Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ 2023; 9: e48291
- 9 Liu J, Wang C, Liu S. Utility of ChatGPT in clinical practice. J Med Internet Res 2023; 25: e48568
- 10 Mondal H, Mondal S, Podder I. Using ChatGPT for writing articles for patients' education for dermatological diseases: a pilot study. Indian Dermatol Online J 2023; 14 (04) 482-486
- 11 Doyal AS, Sender D, Nanda M, Serrano RA. ChatGPT and artificial intelligence in medical writing: concerns and ethical considerations. Cureus 2023; 15 (08) e43292
- 12 Ahaley SS, Pandey A, Juneja SK, Gupta TS, Vijayakumar S. ChatGPT in medical writing: a game-changer or a gimmick?. Perspect Clin Res 2024; 15 (04) 165-171
- 13 Bera K, O'Connor G, Jiang S, Tirumani SH, Ramaiya N. Analysis of ChatGPT publications in radiology: literature so far. Curr Probl Diagn Radiol 2024; 53 (02) 215-225
- 14 Jeblick K, Schachtner B, Dexl J. et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 2024; 34 (05) 2817-2825
- 15 Sarangi PK, Lumbani A, Swarup MS. et al. Assessing ChatGPT's proficiency in simplifying radiological reports for healthcare professionals and patients. Cureus 2023; 15 (12) e50881
- 16 Sarangi PK, Narayan RK, Mohakud S, Vats A, Sahani D, Mondal H. Assessing the capability of ChatGPT, Google Bard, and Microsoft Bing in solving radiology case vignettes. Indian J Radiol Imaging 2023; 34 (02) 276-282
- 17 Haver HL, Lin CT, Sirajuddin A, Yi PH, Jeudy J. Use of ChatGPT, GPT-4, and Bard to improve readability of ChatGPT's answers to common questions about lung cancer and lung cancer screening. AJR Am J Roentgenol 2023; 221 (05) 701-704
- 18 Sarangi PK, Irodi A, Panda S, Nayak DSK, Mondal H. Radiological differential diagnoses based on cardiovascular and thoracic imaging patterns: perspectives of four large language models. Indian J Radiol Imaging 2023; 34 (02) 269-275
- 19 Sarangi PK, Datta S, Swarup MS. et al. Radiologic decision-making for imaging in pulmonary embolism: accuracy and reliability of large language models: Bing, Claude, ChatGPT, and Perplexity. Indian J Radiol Imaging 2024; 34 (04) 653-660
- 20 The Royal College of Radiologists. FRCR 1 (Radiology) - CR1. Accessed July 14, 2024 at: https://www.rcr.ac.uk/exams-training/rcr-exams/clinical-radiology-exams/frcr-part-1-radiology-cr1/
- 21 Radiology Cafe. Mock anatomy. Accessed July 14, 2024 at: https://www.radiologycafe.com/exams/mock-anatomy-exams/
- 22 Liu NF, Lin K, Hewitt J. et al. Lost in the middle: how language models use long contexts. CoRR abs/2307.03172. arXiv preprint arXiv:2307.03172. 2023: 10
- 23 Rajpurkar P, Irvin J, Zhu K. et al. Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225. 2017
- 24 Zhou Y, Ong H, Kennedy P. et al. Evaluating GPT-V4 (GPT-4 with vision) on detection of radiologic findings on chest radiographs. Radiology 2024; 311 (02) e233270
- 25 Brin D, Sorin V, Barash Y. et al. Assessing GPT-4 multimodal performance in radiological image analysis. medRxiv. 2023:2023–11
- 26 Hirano Y, Hanaoka S, Nakao T. et al. GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination. Jpn J Radiol 2024; 42 (08) 918-926
- 27 Bera K, Gupta A, Jiang S. et al. Assessing performance of multimodal ChatGPT-4 on an image based Radiology Board-style examination: an exploratory study. medRxiv. 2024:2024–01
- 28 Sultan LR, Mohamed MK, Andronikou S. ChatGPT-4: a breakthrough in ultrasound image analysis. Radiol Adv 2024; 1 (01) 6
- 29 Senkaiahliyan S, Toma A, Ma J. et al. GPT-4V (ision) unsuitable for clinical care and education: a clinician-evaluated assessment. medRxiv. 2023:2023–11
- 30 Kelly BS, Duignan S, Mathur P. et al. Spot the difference: can ChatGPT4-vision transform radiology artificial intelligence?. medRxiv. 2023:2023–11
- 31 Wu SH, Tong WJ, Li MD. et al. Collaborative enhancement of consistency and accuracy in US diagnosis of thyroid nodules using large language models. Radiology 2024; 310 (03) e232255
- 32 Bhayana R. Chatbots and large language models in radiology: a practical primer for clinical and research applications. Radiology 2024; 310 (01) e232756
- 33 Kim W. Seeing the unseen: advancing generative AI research in radiology. Radiology 2024; 311 (02) e240935
- 34 Narayanan A, Kapoor S. GPT-4 and professional benchmarks: the wrong answer to the wrong question. AI Snake Oil. 2023 . Accessed October 10, 2024 at: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks