Exploring the Effectiveness of ChatGPT in Addressing Direct Patient Queries on Colorectal Cancer Screening

M Maida; Y Mori; L Fuccio; S Sferrazza; A Vitello; A Facciorusso; C Hassan

doi:10.1055/s-0045-1805953

Endoscopy, Table of Contents

Endoscopy 2025; 57(S 02): S378
DOI: 10.1055/s-0045-1805953

Abstracts | ESGE Days 2025

ePosters

Exploring the Effectiveness of ChatGPT in Addressing Direct Patient Queries on Colorectal Cancer Screening

Authors

M Maida

¹Department of Medicine and Surgery, University of Enna ‘Kore’, Enna, Italy

²Gastroenterology Unit, Umberto I Hospital, Enna, Italy
Y Mori

³Clinical Effectiveness Research Group, Oslo University Hospital, Oslo, Norway

⁴Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan
L Fuccio

⁵Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy

⁶IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
S Sferrazza

⁷Gastroenterology and Endoscopy Unit, 'ARNAS Civico – Di Cristina – Benfratelli' Hospital, Palermo, Italy
A Vitello

¹Department of Medicine and Surgery, University of Enna ‘Kore’, Enna, Italy
A Facciorusso

⁸Gastroenterology Unit, Department of Medical and Surgical Sciences, University of Foggia, Foggia, Italy
C Hassan

⁹Endoscopy Unit, Humanitas Clinical and Research Hospital, IRCCS, Rozzano, Italy

¹⁰Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy

Abstract

Full Text

Aims Recent studies showed that large language models (LLMs) could enhance understanding of CRC screening, potentially increasing participation rates. However, a limitation of these studies is that questions posed to LLMs are generated by experts. This study aims to investigate the effectiveness of ChatGPT-4o in answering CRC screening queries directly generated by patients.

Methods Ten patients formulated questions across four CRC screening scenarios, which were posed to ChatGPT in two separate sessions. Responses were assessed by five experts and ten patients.

Results Experts rated the responses with mean scores of 4.1±1.0 for accuracy, 4.2±1.0 for completeness, and 4.3±1.0 for comprehensibility. Patients rated responses as complete in 97.5%, understandable in 95%, and trustworthyin 100% of cases. Finally, we evaluated the text similarity in each pair of responses obtained by ChatGPT-4o in the first and second sessions. The results showed an average similarity of 86.8±2.7% (range 82%-93%), indicating good consistency of outputs over time.

Conclusions Despite variability in questions and answers, ChatGPT confirmed good performances in answering CRC screening queries, even when used directly by patients.