Introduction
Esophagogastroduodenoscopy (EGD) is widely used to examine upper gastrointestinal
lesions [1 ]
[2 ]. White-light imaging (WLI) endoscopy is a standard protocol for examining gastric
lesions; however, the performance of endoscopists varies greatly, leading to a miss
rate of 20 %–40 % for early gastric cancer (EGC) [3 ]. Endoscopy diagnosis is subjective, operator dependent, and varies widely with experience
[4 ], reducing the detection rate of EGC and precursor lesions [5 ]. There is an urgent need to improve endoscopy quality and reliability.
To achieve such improvement, a large number of guidelines have been issued and consensus
of expert opinions in specific areas has been reached [6 ]. Safety and quality indicators for EGD have been proposed by the American Society
for Gastrointestinal Endoscopy and the American College of Gastroenterology [7 ]. The first evidence-based indicator of EGD performance was proposed by the European
Society of Gastrointestinal Endoscopy (ESGE) in 2015 [1 ]. The standard procedure is to examine all parts of the stomach during EGD, with
a recommended examination time of 7 minutes [8 ]
[9 ]
[10 ]. However, due to the lack of monitoring and available tools, adherence to protocols
are often not very high [11 ]. A practical and workable approach should be established to implement guidelines
for routine endoscopy.
In the past few years, deep learning has made remarkable progress in the field of
medical image recognition [12 ]. Most studies are dedicated to the use of computer-aided diagnosis of lesions [13 ]
[14 ]; however, whether deep convolutional neural networks (CNNs) can be used to monitor
the quality of routine endoscopies has rarely been explored. In a previous study,
our group developed a novel artificial intelligence (AI) system, named WISENSE, based
on deep reinforcement learning (DRL) and CNN. WISENSE demonstrated the ability to
monitor blind spots (gastric areas overlooked during EGD) and generate photodocumentation
in real time during EGD [15 ]
[16 ]. In the present study, we updated the WISENSE system by integrating a previously
trained real-time EGC detection model [15 ], and named the updated system “ENDOANGEL.” We then carried out a multicenter randomized
controlled trial (RCT) to verify the ability of ENDOANGEL to improve EGD quality in
five hospitals, and to describe its performance in detecting EGC in the clinical setting.
Methods
Development of the AI system
Three models – model 1 for image qualification, model 2 for gastric cancer prediction,
and model 3 for gastric site classification – were involved in ENDOANGEL. Model 1
and 3 were trained as described in our previous single-center clinical trial [16 ], and model 2 was trained as described in our previous technical work [15 ]. Briefly, the VGG-16 CNN model was trained using transfer learning [17 ] with 12220 in vitro images, 25222 in vivo images, and 16760 unqualified images,
which were filtered to retrieve only clear in vivo frames; the model achieved an accuracy
of 97.6 % in 3000 still images (model 1). VGG-16 and ResNet-50 were respectively trained
using transfer learning with 2204 EGC, 326 advanced gastric cancer, and 4791 noncancerous
images, and achieved an accuracy of 92.5 % for predicting EGC in 200 still images
when 3 VGG-16 and 2 Resnet-50 were combined (model 2). VGG-16 was trained using transfer
learning with 34 513 labeled EGD images of 26 different EGD sites, and DRL was trained
using virtual EGD videos and 30 stored videos in order to achieve human logicality;
VGG-16 combined with DRL achieved an accuracy of 90.0 % for predicting the gastric
site in 107 real videos (model 3). Before images were fed to the CNN, they were first
stripped of black borders and then resized to 224 × 224 pixels to suit the original
dimensions of the CNN models. For the detection of EGC, a CNN algorithm was used.
For the monitoring of blind spots, both CNN and DRL were implemented.
A few modifications were made to model 2 when it was integrated into the AI system,
as described in the supplementary methods (see the online-only Supplementary material).
The three models were integrated, as illustrated in [Fig.1 ], and frame-wise prediction was applied in a clinical setting using client–server
interaction [15 ]. As tested in our previous work, the mean (standard deviation [SD]) total time to
output of a prediction using all three models for each frame was 230 (SD 60) milliseconds.
Therefore, ENDOANGEL was set to process EGD videos with 2 frames per second in real
time.
Fig. 1 Illustration diagram for integrating convolutional neural network (CNN) models into
the ENDOANGEL system. Consecutive frames during esophagogastroduodenoscopy were first
stripped of black borders and then resized to 224 × 224 pixels to suit the original
dimensions of the CNN models. Then, the fitted frames were inputted into CNN1 for
image qualification (Model 1), from which the blurry and in vitro images were discarded,
and in vivo clear images were sent to Model 2 for gastric cancer (GC) detection and
Model 3 for gastric site classification. The final output contains predictions of
observed sites and a box localizing gastric cancer.
The equipment used in this trial is described in the Supplementary material.
RCT trial design
This was a prospective, multicenter, single-blind, randomized, parallel-group study,
approved by the Ethics Committee of Renmin Hospital of Wuhan University.
Patients
From October 2018 to January 2019, patients undergoing routine EGD examinations at
the endoscopy centers of five tertiary hospitals were enrolled in the study. An introduction
and details of the time period for patient enrollment in the five hospitals is presented
in the Supplementary material. The RCT was approved by the institutional review boards
of each participating hospital and performed according to the Declaration of Helsinki.
Inclusion criteria were: 1) age 18 years or above; 2) American Society of Anesthesiologists
physical status score of 1, 2, or 3; 3) informed consent provided.
Exclusion criteria were: 1) patients with absolute contraindications to EGD examination;
2) history of previous gastric surgery; 3) pregnancy; 4) previous medical history
of allergic reaction to anesthetics; 5) unsuitability for participation in the trial
at the investigator’s discretion. Withdrawal criteria were: 1) EGD surgery not completed
due to esophageal stenosis, obstruction, large space-occupying lesions, or ulcers
in the duodenal bulb; 2) premature termination of the EGD due to rapid changes in
the patient’s heart rate or respiratory rate.
The patient population was not limited to specific indications, as most patients with
EGC are asymptomatic [18 ].
Before the trial, 14 enrolled endoscopists studied the ENDOANGEL user interface and
the Japanese systematic screening protocol for the stomach [10 ]. The participating endoscopists at the five hospitals included six from Renmin Hospital
of Wuhan University, two from Tongji Hospital, two from Central Hospital of Wuhan,
two from Yichang Central People’s Hospital, and two from the First People’s Hospital
of Yichang. The participating endoscopists had 3–5 years of EGD experience, had performed
2000–5000 EGD examinations, had diagnosed < 200 EGC cases, and had the ability to
evaluate gastric lesions with magnifying image-enhanced endoscopy (M-IEE).
Interventions
Patients undergoing EGD examination were randomly assigned to a procedure with ENDOANGEL
assistance or no assistance (control). The examination protocol consisted of WLI observation,
M-IEE observation, and biopsy of suspicious lesions. In both groups, endoscopists
first screened the upper gastrointestinal tract using WLI. A biopsy was taken if the
endoscopist predicted that a lesion had a risk of gastric cancer. When endoscopists
could not determine the risk of the lesion using WLI, M-IEE was used to make further
observations and take targeted biopsies. In addition to the original video, four additional
pieces of information were provided to the endoscopists in the ENDOANGEL group: 1)
a virtual stomach model monitoring blind spots; 2) procedure time and duration; 3)
red or green frames indicating cancerous and noncancerous lesions predicted by ENDOANGEL;
4) scoring and grading. The score was positively correlated with the number of observed
sites; scores of 80, 90, and 100 corresponded to “good,” “excellent,” and “perfect,”
respectively. No additional information was provided to the endoscopists in the control
group. A working example of the ENDOANGEL system is shown in Fig.1 s , [Video 1 ], and [Video 2 ], and a representative image for ENDOANGEL detecting EGC is shown in Fig. 2s .
Fig. 2 Trial flow diagram.
Video 1 Representative video of the use of ENDOANGEL for monitoring blind spots and detecting
noncancerous lesions. The system presented the covered gastric sites synchronized
with the process of endoscopy to verify that the entire stomach was mapped. A cartoon
gastric icon was set to be transparent before the examination. As soon as the scope
was inserted into the stomach, the observed sites were colored in the corresponding
part of the icon. Any transparent area indicated that the corresponding sites had
not been observed (i. e. the blind spots). Meanwhile, ENDOANGEL successfully detected
the gastric polyp and recognized it as a noncancerous lesion (in green box).
Video 2 Representative video of the use of ENDOANGEL for detecting cancerous lesions. A pathologically
confirmed early gastric cancer was shown in the video. ENDOANGEL successfully detected
the lesion and recognized it as a suspicious cancerous lesion (in red box).
Outcomes
The main outcome of the study was the number of blind spots (out of 26 per patient)
for both the ENDOANGEL and control groups. Blind spots were defined as the sites unobserved
during EGD, indicated as transparent areas in the gastric icon, as shown in [Video 1 ] and [Video 2 ].
The secondary outcomes were: 1) inspection duration; 2) the percentage of patients
with missing observations (i. e. blind spots) at each site; 3) performance of ENDOANGEL
in predicting EGC in a clinical setting.
The number of blind spots, inspection duration, and the percentage of patients with
blind spots at each site were analyzed in both groups, whereas the performance of
ENDOANGEL in predicting gastric cancer in the clinical setting was analyzed only in
the ENDOANGEL group.
The performance of ENDOANGEL in detecting gastric cancer was analyzed using accuracy,
sensitivity, and specificity. Accuracy was calculated as the number of true predictions
divided by the total number of lesions; sensitivity was calculated as the number of
correctly predicted gastric cancers divided by the total number of gastric cancers;
specificity was calculated as the number of correctly predicted noncancerous lesions
divided by the total number of noncancerous lesions. Noncancerous lesions included
adenoma, low grade neoplasia, intestinal metaplasia, atrophic gastritis, nonatrophic
gastritis, benign ulcer, polyps, Xanthoma, etc.
Two medical students reviewed the EGD data and recorded the start and end times of
each EGD examination. Three experts with more than 10 years of EGD experience independently
reviewed the EGD data from the trial patients and recorded the blind spots. A site
was labeled as observed in a patient only when two or more experts reached an agreement.
When the expert whose label was discarded had objections, the three experts would
discuss the data together and reach a consensus. Endoscopists who performed the EGD
examination did not participate in data evaluation.
Sample size
As the number of blind spots is a discrete variable, the sample size was calculated
using the method of two-sample superiority tests. The mean number of blind spots in
the control group and ENDOANGEL group were estimated as 10 and 5, respectively, with
an overall SD of 4.3. With a power of 0.90, bilateral significance level of 0.05,
and superiority margin of 4.2, 495 patients would be needed in each group. Assuming
a dropout rate of 5 %, the target sample size for each group was 521.
Randomization and blinding
A computer-generated random numerical series was used to generate a random allocation
sequence, with the ENDOANGEL group encoded as “0” and the control group as “1.” Stratified
randomization based on endoscopists was conducted in blocks of four in a 1:1 ratio.
Endoscopists and statisticians were unblinded, whereas patients and all image data
evaluations were performed blindly.
Statistical analysis
A chi-squared test was used to compare the ENDOANGEL and control groups in terms of
baseline characteristics and the percentage of patients with blind spots at each site.
The Mann–Whitney U test with a two-sided significance level of 0.05 was used to compare the other main
and secondary outcomes between the two groups. The 95 % confidence intervals (CIs)
with accuracy, sensitivity, and specificity were calculated using the method of Wilson
procedure, with a correction for continuity. The receiver operating characteristic
curve (ROC) was used to evaluate the performance of the CNN model for detecting EGC.
The ROC curve was developed by plotting the sensitivity against the false-positive
rate (i. e. 1-specificity) by varying prediction thresholds (Fig. 3 s ). Statistical analysis was performed using StatsDirect version 3.1.20 (StatsDirect
Ltd., Birkenhead, UK).
Results
Recruitment
A total of 1239 patients were invited to participate in the trial, 189 of whom were
excluded because they were ineligible (n = 127) or declined to participate (n = 62);
therefore 1050 patients were recruited and randomized ( [Fig.2 ]). A total of 498 patients in the ENDOANGEL group and 504 in the control group were
included in the final analysis of number of blind spots and other outcomes. Patient
characteristics were comparable in both groups ( [Table 1 ]).
Table 1
Baseline characteristics.
Characteristics
ENDOANGEL (n = 498)
Control (n = 504)
Age, mean (SD), years
51.5 (13.2)
51.6 (13.1)
Female, n (%)
273 (54.8)
277 (55.0)
Indications for EGD, n (%)
359 (72.1)
366 (72.6)
36 (7.2)
33 (6.5)
2 (0.4)
5 (1.0)
4 (0.8)
4 (0.8)
6 (1.2)
2 (0.4)
3 (0.6)
3 (0.6)
8 (1.6)
11 (2.2)
13 (2.6)
11 (2.2)
3 (0.6)
4 (0.8)
3 (0.6)
4 (0.8)
19 (3.8)
18 (3.6)
2 (0.4)
2 (0.4)
18 (3.6)
20 (4.0)
14 (2.8)
17 (3.4)
8 (1.6)
4 (0.8)
Recruitment, n (%)
165 (33.1)
175 (34.7)
333 (66.9)
329 (65.3)
EGD, esophagogastroduodenoscopy; GI, gastrointestinal.
Blind spots and inspection duration
In the ENDOANGEL group, the mean number of blind spots was less than that in the control
group (5.38 [SD 4.32] vs. 9.82 [SD 4.98]; P < 0.001) ( [Table 2 ]). Mean inspection time of the EGD procedure was longer in the ENDOANGEL group than
in the control group (5.40 [SD 3.82] minutes vs. 4.38 [SD 3.91] minutes; P < 0.001) ( [Table 2 ]).
Table 2
Primary and secondary outcomes for all patients compared with results from our previous
single-center trial.
End point
Mean (SD)
P value
Mean (SD)
P value
ENDOANGEL (n = 498)
Control (n = 504)
WISENSE (n = 153)
Control (n = 150)
Primary end point
5.38 (4.32)
9.82 (4.98)
< 0.001
1.52 (1.79)
5.84 (3.73)
< 0.001
Secondary end point
5.40 (3.82)
4.38(3.91)
< 0.001
5.03 (2.95)
4.24 (3.82)
< 0.001
SD, standard deviation.
* The number of blind spots per patient out of a total of 26 gastric sites; Results
for WISENSE were cited from our previous single-center clinical trial [16]. It should
be noted that the primary outcome in the previous study is “blind spot rate,” and
the number of blind spots shown in the table (1.52 and 5.84) were converted from the
rate of blind spots (5.86 % and 22.46 %) by multiplying by 26.
The median percentage of patients with blind spots at each site was 21.0 % (range
1.6 %–40.2 %) in the ENDOANGEL group and 38.9 % (range 0.8 %–68.3 %) in the control
group. For 88.5 % of gastric sites (23/26), the percentage of patients in whom the
site was overlooked was significantly lower in the ENDOANGEL group than in the control
group ( [Table 3 ]).
Table 3
The median percentage of patients with blind spots at each site compared with results
from our previous single-center trial.
Overlooked sites
ENDOANGEL (n = 498)
Control (n = 504)
P value
WISENSE (n = 153)
Control (n = 150)
Esophagus
1.6
0.8
0.237
0
0
Squamocolumnar junction
6.2
9.3
0.067
0
1.33
Antrum (G)
9.0
14.1
0.012
0
3.33
Antrum (P)
21.1
39.3
< 0.001
2.61
10.00
Antrum (A)
23.3
37.5
< 0.001
2.61
6.67
Antrum (L)
16.1
31.7
< 0.001
3.92
9.33
Duodenal bulb
4.8
7.1
0.121
0.65
4.00
Duodenal descending
1.6
5.4
0.001
0
6.00
Lower body (G)
8.2
21.6
< 0.001
2.61
17.33
Lower body (P)
26.9
56.2
< 0.001
13.07
29.33
Lower body (A)
20.9
43.3
< 0.001
7.19
18.67
Lower body (L)
18.7
43.5
< 0.001
5.23
30.00
Middle-upper body (F, G)
14.1
20.4
0.008
2.61
5.33
Middle-upper body (F, P)
35.3
60.7
< 0.001
13.07
34.67
Middle-upper body (F, A)
40.2
68.3
< 0.001
13.07
42.67
Middle-upper body (F, L)
38.8
64.3
< 0.001
8.50
56.00
Fundus (G)
5.8
12.1
0.001
2.61
8.67
Fundus (P)
17.1
35.7
< 0.001
8.50
21.33
Fundus (A)
21.3
38.5
< 0.001
14.38
17.33
Fundus (L)
36.3
63.1
< 0.001
18.95
40.67
Middle-upper body (R, P)
34.7
58.7
< 0.001
6.54
17.33
Middle-upper body (R, A)
32.1
57.7
< 0.001
19.61
40.67
Middle-upper body (R, L)
33.5
50.0
< 0.001
13.73
24.00
Angulus (P)
33.1
63.7
< 0.001
27.45
64.00
Angulus (A)
27.5
56.7
< 0.001
12.42
53.33
Angulus (L)
9.8
22.4
< 0.001
3.27
19.33
A, anterior wall; G, greater curvature; F, forward view; L, lesser curvature; P, posterior
wall; R, retroflex view.
The number of blind spots, with or without AI, were compared among the 14 endoscopists
(Fig.4 s , Table 1 s ). With the assistance of ENDOANGEL, the number of blind spots of 11 endoscopists
significantly decreased, while that of the other 3 endoscopists had no significant
change.
Gastric cancer detection
Lesion characteristics in the ENDOANGEL group
In the 498 patients in the ENDOANGEL group, 819 lesions were reported by endoscopists.
Of these lesions, 210 (25.6 %) had biopsy samples taken (196 gastric, 12 esophageal,
and 2 duodenal lesions). The remaining 609 lesions, without biopsies, included 437
gastric, 90 esophageal, and 82 duodenal lesions. Lesion characteristics are described
in Table 2 s . The number of images used per patient was 600 (interquartile range [IQR] 369–710)
in the ENDOANGEL group and 485 (IQR 247–626) in the control group.
Real-time performance of ENDOANGEL in predicting gastric cancer in clinical practice
The two advanced gastric cancers and three EGCs confirmed by pathology in the ENDOANGEL
group were positively predicted by ENDOANGEL. Among 302 692 EGD frames from 498 patients
in the ENDOANGEL group, 2107 (0.7 %) red boxes indicating suspicious gastric cancer
were included in the ENDOANGEL outputs. Of these, 357 (16.9 %) were diagnosed by endoscopists
to have lesions, and the remaining 1750 red boxes contained “noise,” including reflections,
foam, mucus, and folds, as summarized in Table 3 s . For 196 gastric lesions with pathological results, ENDOANGEL correctly predicted
all 5 gastric cancers (2 advanced gastric cancer, 1 mucosal carcinoma, and 2 high
grade neoplasia), with a per-lesion accuracy of 84.7 % (95 %CI 78.7 %–89.3 %), sensitivity
of 100 % (95 %CI 46.3 %–100 %), and specificity of 84.3 % (95 %CI 78.2 %–89.0 %).
For 437 gastric lesions with no pathological results, 31 (7.1 %) were positively predicted
by ENDOANGEL, with the highest positive prediction rates shown for hemorrhagic gastritis
(16.7 % [1 /6]), protruding lesions (15.8 % [3 /19]), and erosive gastritis (11.7 %
[19 /163]).
Discussion
Gastric cancer is the third leading cause of cancer death from a global perspective
[19 ]. Early detection is the key strategy to improve patient survival. However, the quality
of endoscopy varies significantly, impairing the health outcome of patients. Technically,
complete observation is an essential prerequisite for detecting EGC; however, although
protocols for mapping the entire stomach have been widely proposed, they are often
not followed closely in clinical practice. Cognitively, EGC lesions are difficult
to recognize because the mucosal changes are often very subtle, requiring endoscopists
to have thorough knowledge and extensive experience [4 ]
[7 ]. In the current study, we developed ENDOANGEL, a real-time AI assistance system
for the detection of EGC, with no blind spots, to specifically address these two problems.
In this multicenter RCT of blind spot monitoring, we validated effectiveness and robustness
of ENDOANGEL in improving EGD quality; in addition, we prospectively evaluated the
performance and feasibility of ENDOANGEL for the detection of EGC in clinical practice.
Gastric cancer may occur in every part of the gastric cavity [20 ]. Endoscopist competence is an essential prerequisite for the detection of EGC lesions
during EGD. In our previous work, we developed an AI system to classify different
gastric sites and monitor blind spots in real time during EGD, and verified the effectiveness
of the system in improving EGD quality in a single-center RCT [15 ]
[16 ]. Results from this single-center study showed that the number of blind spots dropped
from 5.84 to 1.52 with the assistance of AI. In the current multicenter RCT, we further
verified the effect of improving EGD quality in five different hospitals, and the
number of blind spots dropped from 9.82 to 5.38 with the assistance of ENDOANGEL.
The findings of the two studies are consistent; however, in the present study, the
number of blind spots was higher in both the ENDOANGEL and control groups compared
with that in the previous single-center study, possibly as a result of variability
in the operation quality across the hospitals. The effect of AI on endoscopist practice
may be influenced by endoscopists’ experience with AI systems and their personal views
and acceptance of AI technology, according to a previous report [21 ]. In order to avoid possible center effects, several measures were implemented in
the present study. First, the five hospitals included were all tertiary hospitals
and the participating endoscopists were senior endoscopists with an EGD experience
of 3–5 years and EGD volumes of 2000–5000 examinations. Second, to unify the endoscopic
observation procedures the 14 participating endoscopists were trained to use the Japanese
systematic screening protocol for the stomach before the trial. More importantly,
results from each hospital, including the number of blind spots, were evaluated and
analyzed by the same data analysis team; for cases in which the endoscopic results
were inconsistent with the pathological results, data were reviewed by a single expert
pathologist, to reduce differences within and between centers.
Recently, several studies have tried to use deep learning for EGC recognition. Hirasawa
et al. developed a CNN to detect gastric cancer, which achieved a sensitivity of 92.2 %
[14 ]. Li et al. achieved a sensitivity of 91.2 % for detection of EGC in 341 still images
[22 ]. Our group also developed a CNN model for detecting EGC, which achieved a sensitivity
of 94.0 % in 200 still images [15 ]. However, the images chosen for testing in the previous studies were retrospectively
selected, and the types of noncancerous lesions were limited. In the real world, plenty
of lesions are difficult to distinguish from EGC, such as erosive gastritis and ulcers,
and such lesions were uncommon in the testing datasets in the previous work. Such
selection of lesions may lead to a bias in accuracy in favor of CNN models. In addition,
there is a mass of “noise” during endoscopy in real clinical setting, such as reflections,
blurring, and foam, whereas most retrospective images are of good quality. Therefore,
we prospectively applied our previously trained EGC detection model in a multicenter
clinical trial, evaluated its performance in complex clinical environments, and provided
suggestions for further work in the development of EGC detection models.
Our results revealed two prominent problems when applying EGC detection models to
clinical practice. First, “noise” greatly impacts the accuracy of the model and is
bothersome to endoscopists. Images showing “noise” from consecutive frames in videos
could be collected to train the model to recognize and filter out “noise.” Methods
including localization [23 ] and segmentation [24 ] could be explored to solve this problem by targeting and shielding the image “noise.”
Second, some endoscopists argue that it is almost impossible to accurately predict
EGC in white-light view because other lesions such as erosive gastritis and ulcers
share similar characteristics with EGC. The same point is also presented in the guidelines
from ESGE [25 ]. Our results showed that a small proportion of benign lesions such as erosive gastritis,
ulcers, and polyps were incorrectly diagnosed as gastric cancer, and they are difficult
to distinguish from EGC even for experienced endoscopists. The quantity and diversity
of training datasets could be further increased to improve the performance of the
CNN model in order to extend the limits of human visualization and interpretation.
In addition, we may change our minds and adjust the aims of the AI model from detection
of EGC to recognition of abnormal lesions in WLI that need further observation with
IEE. Further studies should be conducted to explore the supposed solutions and to
improve the EGC detection model.
In the past few years, the performance of CNN models has been generally improved by
increasing the depth and fitting parameters [24 ]
[26 ]. In 2016, He et al. proposed the concept of residuals, and it was proved to be easier
for optimization and achieved better performance with fewer parameters [27 ]. Nowadays, deeper models and smaller kernels are preferred over single layer and
larger kernels [28 ]. Liu et al. elaborately compared different CNN models, with or without transfer
learning, on classifying EGC and gastritis, and found that ResNet-50 achieved a top
accuracy of 95 % when using transfer learning [29 ]. In the present study, we trained both ResNet-50 and VGG-16 using transfer learning
for predicting EGC, and their combined results achieved an accuracy of 92.5 % in still
images. The results of two types of CNNs were combined to reduce the rate of miss-selection
of a single classifier [30 ]; however, in clinical practice, although all EGC lesions were successfully diagnosed,
the false-positive rate increased. Some scholars have explored 3D-CNNs [31 ], segmentation, and long short-term memory network with CNN to improve prediction
results [23 ]. These experiences are valuable for further research.
There are some limitations to our study. First, we only conducted a feasibility analysis
on real-time detection of gastric cancer based on deep learning in a clinical setting.
Whether the AI system can achieve a good performance in gastric cancer detection and
help improve the detection rate of EGC remains to be investigated in larger multicenter
studies. Second, the enrolled patients were not followed up for a long time, and this
may lead to false-negative lesions missed by endoscopists, and the diagnostic ability
of the endoscopists may have an impact on the evaluation of the ENDOANGEL performance.
To avoid this bias, further study in which all patients are followed up or biopsied
should be conducted in order to evaluate the precision of ENDOANGEL in detecting gastric
cancer in a clinical setting. Third, patients and all image data evaluations were
performed blindly in this trial, whereas statisticians were not blinded. Unblinded
statisticians may induce potential bias in analysis; more attention should be paid
to this issue in our future research. Fourth, in addition to IEE, chromoendoscopy
is also one of the major tools used for tumor detection and characterization. In our
previous work, a deep learning method was developed to delineate EGC margin under
chromoendoscopy [32 ]; use of AI to detect and diagnose EGC under chromoendoscopy is still a valuable
direction that could be tried in the future.
In conclusion, ENDOANGEL, a system for improving endoscopy quality based on deep learning,
achieved real-time monitoring of endoscopic blind spots, timing, and EGC detection
during EGD. ENDOANGEL greatly improved the quality of EGD in this multicenter study,
and showed potential for detecting EGC in real clinical settings.
Acknowledgments
We thank our endoscopists and machine-learning engineers for their hard work. We express
gratitude to all patients and hospital staff for support of our trial.