DOI: 10.1055/a-1820-7113

The paradox of artificial intelligence diversification in endoscopy: creating blind spots by exposing them

Referring to Dong Z et al. 10.1055/a-1731-9535
Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
› Author Affiliations

Rarely have technological innovations had such broad impact across different scientific disciplines as the recent progress in artificial intelligence (AI) technology, which has enabled breakthroughs from cosmology [1] to protein folding [2]. In the field of gastrointestinal (GI) endoscopy, AI has also already made a lasting impression, with impressive results having been reported for a variety of applications [3] [4] [5]. While the pace of this development is remarkable, its focus has been rather narrow, as most studies have been concentrated around disease detection and/or staging. Only recently have the broader applications of AI within GI endoscopy started to be seriously explored.

In this issue of Endoscopy, Dong and colleagues take a step towards such a novel application, by presenting a system for the automated generation of visual endoscopy reports [6]. The authors cleverly exploit progress achieved on several independent AI tasks, such as lesion detection and informative frame identification, and integrate these into a single interconnected AI system. This system, named the “endoscopic automatic image reporting system” (EAIRS), comprises a total of seven convolutional neural networks (CNNs), which are independently trained and evaluated, prior to incorporating them into a single system architecture. The authors also thoroughly evaluated the complete AI system, both retrospectively and prospectively, and compared it against the performance of nine experienced endoscopists, using a set of six metrics to capture the completeness of a report (demonstrated in a visually appealing fashion by the radar charts in Figure 7 s). Generating reports that were at least as complete as those generated by the endoscopists, EAIRS convincingly manifests itself among the steadily expanding set of AI tools that could support endoscopists in their day-to-day routine.

“Generating reports that were at least as complete as those generated by the endoscopists, EAIRS convincingly manifests itself among the steadily expanding set of AI tools that could support endoscopists in their day-to-day routine.”

That being said, besides addressing the obvious clinical limitations, such as moving towards a multicenter study on a more diverse patient population, there are also a number of technical issues that leave room for follow-up studies. The use of seven independently trained, yet interconnected, CNNs is not optimal and could lead to undesired behavior, as errors could propagate through the system and cause a snowball effect. A rather trivial example of this would occur when the false positives from the algorithm that discards the non-informative frames show particular lesions, as the lesion-detection algorithm then cannot identify them anymore. Technically, it would be more elegant to pursue a single architecture and, ideally, train it for all tasks simultaneously. This approach leads to more efficient and robust AI systems that can exploit the full context of the interconnected tasks. A first step in this direction could be taken by adopting an architecture featuring a common CNN that extracts information relevant for all tasks, followed by different network heads that each specialize in one of the tasks. This relates to a broader issue, as how AI systems are connected and how they affect each other’s performance are becoming increasingly important considerations, when such systems are co-integrated for various tasks in modern endoscopy equipment.

AI can also positively affect the performance of other AI systems, for example by ensuring the quality of the input imagery of a computer-aided detection (CADe) system. Such quality control algorithms represent yet another application broadening the use of AI within endoscopy and in fact arrived on the scene relatively early, with for example the work of Prof. de Groen and colleagues [7]. The group that developed EAIRS, referred to in this editorial, has also recently proposed two such quality improvement systems – ENDOANGEL [8] and WISENSE [9] – to monitor withdrawal time and blind spots. While these systems aimed to improve human performance, they could clearly also be exploited for the improvement of AI performance. Extending EAIRS with such functionality could be a valuable addition.

In a broader context, such quality algorithms might even be a necessary condition for AI to work in clinical practice. Virtually all supportive AI-based CAD systems have been developed together with academic hospitals that have extensive experience and expertise in imaging and state-of-the-art endoscopic equipment. The gastroenterologists collecting the data used to train these CAD algorithms have mastered the capture of clean images and stable videos of the highest quality over years of performing scientific imaging studies with the best available endoscopes. Paradoxically, the majority of surveillance endoscopies are performed in community hospitals, at which the levels of experience, expertise, and equipment are subject to considerable variation, leading to much more heterogeneous data than was used for the development and training of the CAD algorithms. This so-called “domain gap” is especially concerning for cancer screening applications. Doctors at community centers will for example be under the false impression that an additional AI-based expert is looking over their shoulders, catching the lesions that they might overlook, while in reality the advertised AI performance is subject to degradation, as the input data may be very different from the unrealistically homogeneous data with which the system was trained and validated. In this case, application of the much-celebrated AI-based CAD systems could lead to worse cancer screening performance.

Admittedly this example might be exaggerated, but as AI applications are diversifying while finding different ways to support the endoscopist and improve quality of care, it becomes increasingly important that our eyes remain open for its potential blind spots. These might be very explicit, such as parts of an organ that are not properly imaged, and may be addressed using methods to monitor and quantify the visualized epithelium [10]. But these blind spots can also be more subtle, originating from the way algorithms were trained or interact with each other. As these are much less obvious and easily overlooked, these types of blind spots are also more dangerous and require careful technical design and holistic performance evaluation of the entire AI chain to avoid them.

Publication History

Article published online:
12 May 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany