Endoscopy
DOI: 10.1055/a-2566-9576
Editorial

Getting the best out of artificial intelligence in endoscopy

Referring to Jong MR et al. doi: 10.1055/a-2537-3510
Laurence B. Lovat
1   Division of Surgery and Interventional Science, University College London, London, United Kingdom of Great Britain and Northern Ireland
› Author Affiliations

Artificial intelligence (AI) algorithms are entering mainstream practice. Computer-aided detection (CADe) of colorectal polyps in screening colonoscopy is being rolled out around the world and, overall, brings significant improvements in quality metrics such as adenoma detection rates [1]. There are, however, some ongoing issues. Not everyone benefits from using AI and the quality of some endoscopic examinations is reduced. The reasons for this are still unclear, but there is no doubt that the early adopters still need to convince their less-willing colleagues to use the technology [2]. To this end, quality and accuracy are key.

“Only frames that passed the quality threshold would be then included in the CADe analysis; thus, the overall detection accuracy was expected to improve. The authors achieved their aims, and by excluding poor-quality frames (around 40%), detection accuracy increased from 87.2% to 91.4%.”

So, how is a clinically valuable AI system created? This currently requires intensive human input. Individual endoscopy images are manually annotated for the presence and location of polyps. At 30 frames per second and with an average procedure duration of around 20 minutes, that equates to 36 000 frames to annotate per complete procedure. Most will be normal, so the focus will be on those frames that contain polyps. Nonetheless, millions of frames from thousands of individual procedures need to be annotated to create a robust AI model, even with various tricks such as pre-training and semi-automated annotation. Annotation “farms” have sprung up around the world to support this effort. They employ many people whose full-time job is to annotate image data sent to them, based on pre-defined criteria. To ensure accuracy, the best places will employ domain experts, such as endoscopists, to ensure annotation is accurate. It is now big business.

Another important consideration is procedure quality. The bowel must be clean, with bubbles and fluid removed. Inspection time should be adequate. However, even in a well-performed colonoscopy, around half the frames will be blurred. Multiple groups have found that training colonic polyp CADe algorithms on substandard data leads to poorly functioning models.

The problem is more acute in the upper gastrointestinal tract when detecting dysplasia arising in Barrett’s esophagus. Dysplasia is even more subtle than flat colonic polyps and in around 20% of cases is invisible to the human eye, even with the best optical visualization of the latest endoscopes. But it is also rare, and even experienced endoscopists may miss subtle changes arising in Barrett’s mucosa [3]. Several groups have shown that AI systems can detect dysplasia in Barrett’s and, although formal trials have not yet been completed, there is a feeling that AI is likely to enhance the average endoscopist’s detection.

The European Society of Gastrointestinal Endoscopy has advocated for metrics of high-quality endoscopy, including adequate inspection time and proper mucosal cleaning [4]. Cue the BONS-AI consortium, an international collaboration led by the Amsterdam University Medical Center. This group has led the world in imaging and treating Barrett’s neoplasia, and now turns its hand to the problem of generating high-quality CADe for patients with Barrett’s esophagus [5]. The group developed and evaluated a new computer-aided quality (CAQ) model in an ex vivo setting. The model identifies whether esophageal mucosal is adequately visualized, based on an assessment of mucosal cleanliness, esophageal expansion, and image clarity. The model integrates with their existing CADe system for BE dysplasia detection. Although the group pre-trained the CAQ system on 5 million images, the final training set for this new model consisted of only 7463 images and video frames from 359 patients in 13 centers. Both objective and subjective labels were applied to each image by three research fellows under supervision, loosely in line with the new Gastroscope RAte of Cleanliness Evaluation score, which evaluates endoscopic cleanliness. The model was tested in two other datasets. The first, testing the accuracy of a stand-alone CAQ system, comprised 647 nondysplastic images from 51 patients. The second, testing the integration of CAQ into their existing CADe system, used 956 nondysplastic and 557 dysplastic images from 97 patients. Three neural networks were then created to assess objective image quality, esophageal expansion, and cleaning adequacy. Due to the speed that computers now operate, adding the new models to their existing CADe system could still generate real-time feedback for endoscopists. Only frames that passed the quality threshold would then be included in the CADe analysis; thus, the overall detection accuracy was expected to improve.

The authors achieved their aims, and by excluding poor-quality frames (around 40%), detection accuracy increased from 87.2% to 91.4%. This relatively small increase is, nonetheless, very important. First, it demonstrates once again the adage of “garbage in, garbage out.” The concept applies just as well to computer algorithms as it does to food, speech, and thought (particularly well demonstrated by those who overly engage with social media). Second, the work was done by expert endoscopists. In nonexpert settings, standard detection accuracy is likely to be much lower [6]. We already know that most humans are uncomfortable in relinquishing their often-incorrect decision making to computers [7] [7]. Improved detection algorithms will enhance endoscopists’ trust of AI and speed the transition to acceptance of a computer-based diagnostic tool that contradicts their own flawed perception (particularly if the only impact is to take an extra targeted biopsy). The system could also offer feedback to clinicians about how well they are cleaning the mucosa or inflating the esophagus. This might improve practice, although it remains to be tested. The other interesting point highlighted by this work is that only a small number of additional frames are needed to train a new AI model, once a baseline pre-training threshold is reached [8].

One issue remains unknown but important. As generative AI models advance rapidly, might it be that soon, human annotation of images will no longer be needed? Generative AI can already create synthetic data to reduce training data requirements. Might it even become possible to iteratively improve detection without any pre-training? This idea is not as fanciful as it was only 2 years ago. However, generative AI can still hallucinate, and patients are likely to still prefer our flawed human reasoning over super-intelligent hallucinating computers, at least for the time being. We are not out of a job just yet.



Publication History

Article published online:
23 April 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

 
  • References

  • 1 Hassan C, Spadaccini M, Mori Y. et al. Real-time computer-aided detection of colorectal neoplasia during colonoscopy: a systematic review and meta-analysis. Ann Intern Med 2023; 176: 1209-1220
  • 2 Djinbachian R, Rex DK, von Renteln D. Optical polyp diagnosis in the era or artificial intelligence. Am J Gastroenterol 2024;
  • 3 Visrodia K, Singh S, Krishnamoorthi R. et al. Magnitude of missed esophageal adenocarcinoma after Barrett’s esophagus diagnosis: a systematic review and meta-analysis. Gastroenterology 2016; 150: 599-e15
  • 4 Messmann H, Bisschops R, Antonelli G. et al. Expected value of artificial intelligence in gastrointestinal endoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Position Statement. Endoscopy 2022; 54: 1211-1231
  • 5 Jong MR, Jaspers TJM, van Eijck van Heslinga RAH. et al. The development and ex vivo evaluation of a computer-aided quality control system for Barrett’s esophagus endoscopy. Endoscopy 2025; 57
  • 6 Schölvinck DW, Van Der Meulen K, Bergman JJGHM. et al. Detection of lesions in dysplastic Barrett’s esophagus by community and expert endoscopists. Endoscopy 2017; 49: 113-120
  • 7 Korteling JEH, Paradies GL, Sassen-van Meer JP. Cognitive bias and how to improve sustainable decision making. Front Psychol 2023; 14: 1129835
  • 8 Davidson T, Denain J-S, Villalobos P. et al. AI capabilities can be significantly improved without expensive retraining. arXiv 2023;