Subscribe to RSS
DOI: 10.1055/s-0045-1806108
Revolutionizing Diagnostics: Evaluating ChatGPT-4's Performance in Ulcerative Colitis Endoscopic Assessment
Aims : The Mayo Endoscopic Subscore (MES) is a widely utilized measure for assessing endoscopic disease activity in ulcerative colitis (UC) [1]. This assessment has a key role in clinical practice as endoscopic remission is a long-term therapeutic objective [2]. Artificial intelligence has emerged as a promising tool for enhancing diagnostic precision and addressing inter-observer variability among endoscopists [3]. This study aims to evaluate the diagnostic accuracy of ChatGPT-4, a multimodal large language model (LLM), in identifying and grading endoscopic images of UC patients using the MES as a reference standard, without prior configuration or fine-tuning.
Methods Real-world endoscopic images of UC patients were obtained for severity assessment and reviewed by an expert consensus board. Each image was classified by severity grade (0-3) based on the MES. Only images that were uniformly graded by the consensus board were subsequently provided to three IBD specialists and ChatGPT-4 in three separate sessions. Severity gradings of the IBD specialists and ChatGPT-4 were compared with assessments made by the expert consensus board
Results Fifty endoscopic images were initially evaluated by the expert consensus board. Of those, 30 images (60%) were graded with complete agreement of MES among the experts. Compared to the consensus board, ChatGPT4’s MES gradings were accurate in 26/30 (86.7%), 21/30 (70%) and 24/30 (80%) with a mean accuracy rate of 78.9%. The IBD specialists gradings were accurate in 24/30 (80%), 24/30 (80%) and 25/30 (83.3%) with a mean accuracy rate of 81.1%. There was no statistically significant difference in mean accuracy rates between the two groups (p=0.71).
Conclusions ChatGPT-4 has the potential of assessing mucosal inflammation severity from endoscopic images of UC patients, without prior configuration or fine-tuning. Performance rates were comparable to IBD specialists. Further research and validation are warranted to explore the broader applications of LLMs and their integration into diagnostic workflows.
Publication History
Article published online:
27 March 2025
© 2025. European Society of Gastrointestinal Endoscopy. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Osada T, Ohkusa T, Yokoyama T. et al. Comparison of several activity indices for the evaluation of endoscopic activity in UC: Inter- and intraobserver consistency. Inflamm Bowel Dis 2010; 16 (2): 192-197
- 2 Turner D, Ricciuto A, Lewis A. et al. STRIDE-II: An Update on the Selecting Therapeutic Targets in In fl ammatory Bowel Disease (STRIDE) Initiative of the International Organization for the Study of IBD (IOIBD): Determining Therapeutic Goals for Treat-to-Target strategies in IBD. 2021; 1570-1583
- 3 Schroeder K, Tremanie W, Ilstrup D.. Coated Oral 5-aminosalicylic acid therapy for mildly to moderately active ulcerative colitis. A randomized study. N Engl J Med 1987; 317 (26): 1625-1629