RSS-Feed abonnieren

DOI: 10.1055/a-2586-5912
Comparing ChatGPT3.5 and Bard in recommending colonoscopy intervals: bridging the gap in healthcare settings

Background and Study Aims: Colorectal cancer is a leading cause of cancer-related deaths, with screening and surveillance colonoscopy playing a crucial role in early detection. This study examines the efficacy of two, freely available, Large Language Models (LLMs), GPT3.5 and Bard, in recommending colonoscopy intervals in diverse healthcare settings. Patients and methods: A cross-sectional study was conducted using data from routine colonoscopies at a large safety-net and a private tertiary hospital. GPT3.5 and Bard were tasked with recommending screening intervals based on colonoscopy reports and pathology data with their accuracy and inter-rater reliability compared to a guideline-directed endoscopist panel. Results: Out of 549 colonoscopies analyzed (N=268 at safety-net and N=281 private hospital), GPT3.5 showed better concordance with guideline recommendations (GPT3.5: 60.4% vs. Bard: 50.0%, p<0.001). In the safety-net hospital, GPT3.5 had a 60.5% concordance rate with the panel compared to Bard’s 45.7% (p<0.001). For the private hospital, concordance was 60.3% for GPT3.5 and 54.3% for Bard (p=0.13). GPT3.5 showed fair agreement with the panel (kappa=0.324), whereas Bard displayed lower agreement (kappa=0.219). For the safety-net hospital, GPT3.5 showed fair agreement with the panel (kappa=0.340) while Bard showed slight agreement (kappa=0.148). For the private hospital, both GPT3.5 and Bard demonstrated fair agreement with the panel (kappa=0.295 and 0.282 respectively). Conclusion: The study highlights the limitations of freely available LLMs in assisting colonoscopy screening recommendations. While the potential of freely available LLMs to offer uniformity is significant, the low accuracy, as noted, excludes their use as the sole agent in providing recommendations.
Publikationsverlauf
Eingereicht: 19. August 2024
Angenommen nach Revision: 07. April 2025
Accepted Manuscript online:
14. April 2025
© . The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
Maziar Amini, Patrick W. Chang, Rio O. Davis, Denis D. Nguyen, Jennifer L Dodge, Jennifer Phan, James Buxbaum, Ara Sahakian. Comparing ChatGPT3.5 and Bard in recommending colonoscopy intervals: bridging the gap in healthcare settings. Endosc Int Open ; 0: a25865912.
DOI: 10.1055/a-2586-5912