Subscribe to RSS

DOI: 10.1055/a-2721-6552
A rural-to-center artificial intelligence model for diagnosing Helicobacter pylori infection and premalignant gastric conditions using endoscopy images captured in routine practice
Authors
Supported by: National Taiwan University Hospital 114-E0009
Supported by: National Science and Technology Council 111-2314-B-002-136-MY3 and 114-2314-B-002-245-MY3
Clinical Trial:
Registration number (trial ID): NCT05762991, Trial registry: ClinicalTrials.gov (http://www.clinicaltrials.gov/), Type of Study: Prospective

Abstract
Background
Diagnosing Helicobacter pylori infection and premalignant gastric conditions typically requires 13C urea breath testing or histological assessment, which are often unavailable in remote areas. A rural-to-center artificial intelligence (AI) model was developed and implemented to automatically evaluate upper endoscopy images from routine clinical practice.
Methods
Endoscopic images were collected from a rural hospital on Matsu Islands and a tertiary center across Taiwan Strait. During model development (2020–2022), AI algorithms were trained, validated, and tested to exclude low-quality and non-gastric images, segment gastric regions, and enhance mucosal features for detecting H. pylori infection and premalignant conditions. During model implementation (2023–2024), endoscopic images from a rural hospital were transmitted to the medical center for AI analyses, with results promptly returned.
Results
In the development phase, diagnostic accuracies were 92.8% (95%CI 88.9%–96.6%) for H. pylori, 88.6% (95%CI 87.2%–90.0%) for atrophic gastritis, and 88.0% (95%CI 86.5%–89.5%) for intestinal metaplasia. In the implementation phase, 3518 rural residents underwent 13C urea breath testing or pepsinogen testing; 421 with positive results underwent endoscopy. No significant differences were observed between AI-predicted and clinically observed prevalence: H. pylori (13.9% vs. 12.9%; P = 0.55), atrophic gastritis (15.7% vs. 11.9%; P = 0.34), and intestinal metaplasia (27.6% vs. 22.4%; P = 0.32). Implementation-phase diagnostic accuracies were 91.3% (95%CI 88.0%–94.6%), 79.9% (95%CI 72.1%–86.3%), and 63.4% (95%CI 54.7%–71.6%), respectively.
Conclusions
AI enabled physicians in resource-limited settings to rapidly assess gastric health using routinely captured endoscopic images, bridging gaps in access and expertise.
Introduction
Gastric cancer remains a global health concern, with 968 784 new cases and 660 175 deaths reported in 2022, ranking fifth in both incidence and mortality [1]. The primary cause is Helicobacter pylori infection [2], which can be treated with a short course of antibiotics to reduce cancer risk [3] [4]. If left untreated, H. pylori can cause chronic inflammation and lead to premalignant gastric conditions that increase cancer risk. Individuals with these conditions may benefit from endoscopic surveillance for early cancer detection. However, such assessments are not routinely conducted due to time and resource demands, interobserver variability, specialist dependence, and difficulty determining biopsy number and sites [5].
To reduce the threat of gastric cancer, primary and secondary prevention strategies can work together [6]. In recent years, a rapidly growing body of research has investigated the application of artificial intelligence (AI) in supporting the diagnosis of H. pylori infection and premalignant conditions, using either static images or real-time analyses with heatmaps or bounding box visualization (see Table 1s in the online-only Supplementary Material) [7] [8] [9] [10]. However, in resource-limited settings, factors such as high clinical workloads, large image volumes, need for image enhancement, and limited computational infrastructure may place an additional burden on endoscopists. Utilizing routinely captured images to directly generate a global evaluation may offer a more efficient alternative. However, implementing such an approach may require a series of AI models capable of automatically preprocessing and selecting relevant images from the start-up to the replication of diagnostic reasoning of expert endoscopists and pathologists, a process that involves multiple steps rather than a single evaluation.
This novel concept was tested in a rural community on the Matsu Islands, located approximately 206 km across Taiwan Strait from Taiwan main island. This rural area, with limited health care resources, served as a pilot site for initiating H. pylori screening and eradication in 2004 [11] [12], which was later extended using the colorectal cancer screening platform in 2014 [4], expanded to indigenous communities in 2018 [13], and ultimately scaled up to a population-wide implementation across Taiwan in 2026 [14]. Although H. pylori screening and eradication have reduced the incidence of gastric cancer, cases may still occur among individuals who remain infected or have harbored premalignant conditions [15]. A cloud-based computation system was evaluated using a rural-to-center AI model to enable comprehensive yet efficient assessment of H. pylori infection and premalignant gastric conditions from endoscopic images captured during routine clinical practice.
Methods
Design of a rural-to-center AI model
The study involved both model development and implementation, using upper endoscopy images stored in the picture archiving and communication system (PACS) as input data. Clinical information, including 13C urea breath testing and gastric histological evaluations, was used as the reference standard. During model development, AI models were trained, validated, and tested to exclude blurred and non-gastric images, segment gastric regions, and enhance mucosal details for diagnosing active H. pylori infection and premalignant conditions. During model implementation, upper endoscopy images from screening programs were transmitted from the frontend of a rural hospital to the backend of a medical center for computation. The AI-generated results were immediately returned to the rural hospital via a mobile PACS platform (EBM Technologies Inc., Taipei, Taiwan) ([Fig. 1]). An online video demonstration shows how the system functions (https://youtu.be/bLOS1rBJLUw). The study followed the quality assessment of AI preclinical studies in diagnostic endoscopy (QUAIDE) checklist [16].


Data acquisition
Data were collected from two sources: a rural hospital (Lienchiang County Hospital, Matsu Islands) and a medical center (National Taiwan University Hospital, Taipei). Fig. 1s illustrates the geographic locations. The medical center’s centralized data warehouse contained electronic medical records, including chart records, laboratory data, examination reports, pathological results, and medical images, gathered from 10 affiliated hospitals [17]. Data for model training and validation were primarily randomly selected from this warehouse. Data from the rural hospital were mainly used for validation and testing.
Model development
The model development process was designed to mirror the diagnostic workflow of expert endoscopists and pathologists ([Fig. 2]). The upper endoscopy examination was conducted in accordance with the systematic screening protocol [18]. Each endoscopic image was selected and classified by experienced endoscopists (Drs. Chiang TH and Lee YC), each with over 20 years of experience, based on the purpose and desired outcome of each step. Prior to machine learning, the images were preprocessed to remove irrelevant elements (Fig. 2s). The Laplacian method was used to remove blurred images, with a threshold set at a Laplacian score of 800 (Fig. 3s) [19]. Images were divided into training, validation, and testing datasets. Each image was further segmented into patches, which were classified based on the proportion of patches diagnosed with a specific outcome, following the general principle of convolutional neural networks (Supplementary methods). A cutoff value, determined during the training phases, was used to assign image-level diagnoses. The validation process incorporated an iterative upgrade module that analyzed misclassified cases to enable continuous model optimization. Model performance was subsequently evaluated using the testing dataset. The development workflow is summarized in [Table 1], with additional details provided in Fig. 4s, which demonstrates this process using the histological model as an example. The process is described step by step below.


Step 1: Remove organic lesions and enhanced images
The study focused on background gastric mucosa. Images of visible organic lesions, such as tumors, polyps, and ulcers, were excluded, as these would already prompt clinical management and were not within the intended focus of this study. Enhanced images, such as narrow-band images, were also excluded because of different interpretations (Fig. 5s).
Step 2: Remove non-gastric images
Upper endoscopy examinations also included the hypopharynx, esophagus, and duodenum, while the related images were irrelevant to the study’s purpose (Fig. 6s).
Step 3: Classify different regions of the stomach
Different anatomical locations of the stomach may show distinct mucosal manifestations for both H. pylori infection and premalignant conditions. Additionally, the prospective histological assessment was limited to the corpus and antrum. Therefore, it was essential to differentiate between these anatomical locations (Fig. 7s).
Step 4: Classify the presence of active H. pylori infection
As the population of Matsu Islands underwent previous mass eradication of H. pylori [14], data to evaluate the H. pylori infection were obtained from the medical center. The prevalence rate of active H. pylori infection was found to be 55.9% using 13C urea breath testing as the reference standard. This step involved three AI models for the three different locations (antrum, corpus, cardia/fundus), as they may show different mucosal patterns of H. pylori infection (Fig. 8s) [20]. Endoscopists (Drs. Chiang TH and Lee YC) reviewed and selected both infected and non-infected images to match the corresponding breath test results. The classification process had to start with image enhancement using white balance adjustments to achieve sufficient model performance (Fig. 9s).
Step 5: Classify the presence of premalignant gastric conditions
Histological diagnoses of premalignant gastric conditions were not routinely available in standard upper endoscopy practice. In the rural hospital, these data were prospectively collected from a population-based H. pylori screen-and-treat program on the Matsu Islands [11] [12]. At the medical center, these data were collected as part of randomized clinical trials [21] [22]. The prevalence and severity of premalignant conditions using the modified Sydney protocol are shown in Table 2s. In brief, gastric mucosa biopsy specimens were obtained from the antrum (2–3 cm from the pylorus along the greater and lesser curvatures) and corpus (one each from the lesser and greater curvatures at the middle corpus) [23]. Senior histopathologists, unaware of participants’ clinical status, performed all histological assessments (Drs. Chen YC, Chiang H, and Shun CT). The specimens were graded as acute inflammation (polymorphonuclear infiltrates), chronic inflammation (lymphoplasmacytic infiltrates), atrophic gastritis (loss of glandular tissue and fibrous replacement), or intestinal metaplasia (presence of goblet cells and absorptive cells). The severity of each category was rated as none, mild, moderate, or marked, enabling classification of the severity of premalignant conditions using the Operative Link for Gastritis Assessment of Atrophic Gastritis (OLGA) and Operative Link for Gastritis Assessment of Intestinal Metaplasia (OLGIM) criteria, ranging from stage 0 to stage 4 [24] [25]. The weighted kappa values for gastric atrophy and intestinal metaplasia were 0.62 and 0.74, respectively, among the pathologists [11] [12]. The histological diagnoses served as the reference standard. As high-stage diseases were rare, the classification was dichotomized based on the presence or absence of premalignant conditions in the antrum and corpus (stages 0 and ≥1). Given the known patchy distribution of premalignant conditions, stored images were reviewed and selected by endoscopists (Drs. Chiang TH and Lee YC) to ensure alignment with corresponding histological diagnoses. To achieve sufficient model performance, this step began with image enhancement using contrast-limited adaptive histogram equalization [26], following the principles of image-enhanced endoscopy (Fig. 10s).
A per-patient assessment
For implementation, analyses needed to convert the per-image basis to a per-patient diagnosis for both H. pylori infection and premalignant conditions. As images captured from the same anatomical location (antrum, corpus, or fundus/cardia) could yield varying interpretations, some indicating H. pylori infection or premalignant changes and others not, a voting procedure was applied. A patient was classified as having a specific outcome if the percentage of positive images among all gastric images exceeded a predefined cutoff. This cutoff was calibrated to align with the observed prevalence rate [11] [12] [17] [21] [22].
Interpretability analyses
Given that the AI model may function as a black box, interpretability analyses were conducted using two complementary approaches to ensure transparency and clinical relevance, including the per-image approach and the comprehensive approach. First, gradient-weighted class activation mapping was applied to generate heatmaps for each image, highlighting regions of model focus. This enabled case-by-case evaluation of whether the AI attention aligned with clinical judgment at each diagnostic step, as visualized through the mobile PACS (Fig. 11s).
Second, a comprehensive interpretability analysis was performed by assessing whether the model high-attention classifications corresponded with established associations among H. pylori infection, premalignant conditions, and gastric cancer. Raw endoscopic images were obtained from an independent dataset comprising patients diagnosed with gastric cancer, identified through a data warehouse linked to the Taiwan Cancer Registry (2004–2022). Cases included patients who had undergone upper endoscopy at least 180 days prior to their cancer diagnosis and had archived images, representing individuals with either undetected gastric cancer or a precancerous condition. Controls were randomly selected from patients without a gastric cancer diagnosis during the same period. Archived endoscopic images from both groups were processed through steps 1 to 5 of the AI models. Classification results for the cancer and non-cancer groups were then analyzed to determine whether the AI interpretability aligned with established clinical risk factors. The results were assessed using Shapley additive explanations values, which measured the change in model predictability when the risk factor was present versus absent [27].
Model implementation
Based on the population registry, a community-based screening program invited residents of the Matsu Islands aged 30 years or older to undergo 13C urea breath testing or pepsinogen testing on a biennial basis in alternating years [11] [12] [15]. Individuals who tested positive for active H. pylori infection were referred for eradication therapy, while those with abnormal pepsinogen results were referred for upper endoscopy examination and histological evaluation according to the random biopsy protocol, as detailed above (Sydney protocol) [23]. Routinely captured upper endoscopy images from participants were transmitted to the AI inference backend at the medical center, and the AI-generated results were instantly relayed back to the rural hospital.
Ethical approval
The study was approved by the Ethics Committee of National Taiwan University Hospital (No. 201402061RINA), and all participants provided written informed consent.
Statistical analyses
Patients’ baseline characteristics were summarized as percentages for categorical variables and as means with SD for continuous variables. During model development, the discriminative performance of the AI models was assessed using sensitivity, specificity, and diagnostic accuracy, with the corresponding 95%CIs to evaluate statistical significance. The model with the highest accuracy at each step was selected and integrated into the system. During the implementation phase, the McNemar’s test was used to compare the prevalence rates between AI-predicted and observed outcomes in paired data from the same rural participants. Diagnostic accuracies were compared with those from the development phase using a two-sample proportion test between two independent populations from rural and center hospitals. As the study was exploratory, adjustments for multiple comparisons were not applied to avoid potential false-negative findings, aligning with the goal of generating new insights.
For computation, Python within the TensorFlow 2.8 framework was used on the NVIDIA DGX A100 GPU (40G; NVIDIA Corporation, Santa Clara, California, USA). A 2-sided P value of <0.05 was considered statistically significant for all outcomes.
Cost-effectiveness analysis
The AI-assisted approach may also increase the medical burden of H. pylori testing and endoscopic surveillance, as it generates additional information that may prompt further work-up. A cost-effectiveness analysis was conducted with the primary end point of life-years gained, estimated by translating screening-related mortality reductions into life-years gained (Supplementary methods). The structure of the Markov model and the data inputs are presented in Fig. 12s, Fig. 13s, and Table 3s. The incremental cost-effectiveness ratio was calculated as the difference in costs divided by the difference in life-years between the AI-assisted strategy and routine practice.
The analyses were performed using TreeAge Pro 2024 (TreeAge Software, Inc., Williamstown, Massachusetts, USA).
Results
Model development
The best-performing models for each step are shown in [Table 1]. Details of the model selection process are provided in Fig. 14s. In step 1, all seven deep learning models demonstrated strong performance. The best-performing model, based on DenseNet121, achieved sensitivity, specificity, and accuracy of 95.1% (95%CI 92.3%–97.9%), 91.2% (95%CI 88.3%–94.1%), and 97.0% (95%CI 94.8%–99.2%), respectively, on the testing set for excluding organic lesions and enhanced images ([Table 2]). For step 2, all six models showed excellent performance, with the best-performing DenseNet201 model achieving sensitivity, specificity, and accuracy of 99.6% (95%CI 99.2%–100%), 100%, and 99.9% (95%CI 99.9%–100%), respectively, for excluding non-stomach images. For step 3, the performance was slightly lower compared with the first two steps, as the cardia/fundus, corpus, and antrum are continuous structures and there were borderline areas. The best-performing DenseNet121 model achieved sensitivity, specificity, and accuracy values of 90.5% (95%CI 89.5%–91.5%), 95.0% (95%CI 94.6%–95.4%), and 98.2% (95%CI 98.0%–98.4%), respectively, for differentiating between these three locations.
For step 4, using InceptionResNetV2 for the cardia/fundus, ResNet50 for the corpus, and DenseNet121 for the antrum, the overall sensitivity, specificity, and accuracy were 95.0% (95%CI 92.9%–97.2%), 91.2% (95%CI 85.6%–96.6%), and 92.8% (95%CI 88.9%–96.6%), respectively, for detecting active H. pylori infection. For step 5, the best-performing Vision Transformer model achieved sensitivity, specificity, and accuracy of 79.4% (95%CI 75.3%–83.5%), 74.7% (95%CI 72.8%–76.6%), and 88.6% (95%CI 87.2%–90.0%), respectively, for differentiating atrophic from non-atrophic gastric mucosae. The same Vision Transformer model achieved sensitivity, specificity, and accuracy of 78.2% (95%CI 74.7%–81.7%), 71.3% (95%CI 69.1%–73.5%), and 88.0% (95%CI 86.5%–89.5%), respectively, for differentiating between the presence and absence of intestinal metaplasia.
The per-patient assessment
For the per-patient assessment, the diagnosis of active H. pylori infection was determined by a majority vote (i.e. when the number of positive images divided by the total number of images was ≥50%). Atrophic gastritis was considered positive when the proportion of positive images exceeded 20%. Intestinal metaplasia was considered positive when the proportion of positive images exceeded 7%.
Interpretability analysis
For the per-image approach, gradient-weighted class activation mapping generated heatmaps for the regions of model focus for the representative images (Figs. 5s–7s, 9s–11s), and these generally aligned with clinical judgement. For the comprehensive approach, a total of 326 patients (35 670 images) with subsequent gastric cancer and 6369 patients (168 116 images) without subsequent gastric cancer were enrolled. The mean time between gastric cancer diagnosis and the last upper endoscopy was 4.1 years (SD 3.5). The raw upper endoscopy images were preprocessed and analyzed by AI through steps 1 to 5, using the best-performing model selected for each step. The interpretation was finally transformed to a per-patient basis for the diagnosis of active H. pylori infection and premalignant conditions. Two AI models, logistic regression model, and support vector machine, demonstrated consistent results on the testing sets (Table 4s), with sensitivity, specificity, and accuracy of 89.8% (95%CI 83.9%–95.7%), 81.6% (95%CI 74.3%–88.9%), and 90.2% (95%CI 84.4%–96.0%), respectively, which were significantly better than those of the model solely based on age and sex, with sensitivity, specificity, and accuracy of 75.5% (95%CI 67.5%–83.5%), 59.4% (95%CI 50.9%–68.0%), and 73.0% (95%CI 64.8%–81.2%), respectively (all P < 0.001). Evaluation of variable importance indicated that active H. pylori infection and the presence of intestinal metaplasia, as generated by the AI models, were highly influential (Fig. 15s).
Model implementation
Between March 3, 2023, and April 30, 2024, a community-based screening program was conducted on the Matsu Islands. Of the 3518 eligible individuals aged 30 years or older who were invited, 2651 (mean age 54.0 years [SD 13.4]) participated in pepsinogen testing in 2023, while 2855 (mean age 54.4 years [SD 13.6]) underwent ¹³C urea breath testing in 2024 (Table 5s). A total of 166 individuals (6.3%) tested positive for pepsinogen, and 264 individuals (9.2%) tested positive for H. pylori infection ([Fig. 3]). Among the pepsinogen-positive individuals, 134 underwent endoscopic examination and histological evaluation. Additionally, 287 individuals who received H. pylori testing had upper endoscopy images stored in the rural hospital PACS.


As shown in [Fig. 4], AI-predicted vs. observed prevalence rates were: H. pylori (13.9% vs. 12.9%; P = 0.55), atrophic gastritis (15.7% vs. 11.9%; P = 0.34), and intestinal metaplasia (27.6% vs. 22.4%; P = 0.32), at an individual patient level. No significant differences were observed across all comparisons. Implementation-phase diagnostic accuracies were 91.3% (95%CI 88.0%–94.6%) for H. pylori, 79.9% (95%CI 72.1%–86.3%) for atrophic gastritis, and 63.4% (95%CI 54.7%–71.6%) for intestinal metaplasia. Compared with the development-phase accuracies, the results were not significantly different for H. pylori infection and atrophic gastritis, whereas a significant difference was shown for intestinal metaplasia. In the community, five individuals with a history of gastric cancer had endoscopic imaging archived in the PACS at the rural hospital, prior to gastric cancer diagnosis. The AI model diagnosed H. pylori infection and intestinal metaplasia in all cases, while atrophic gastritis was identified in one.


Cost-effectiveness analysis
An incremental gain of 2.02 life-years and a cost reduction of USD 93.7 were observed with the AI-assisted strategy compared with routine practice (Fig. 16s), resulting in an incremental cost-effectiveness ratio of –46.4 and a cost-saving result. The benefits of H. pylori eradication and early gastric cancer detection triggered by AI may outweigh the burden associated with advanced cancer treatments and reduced life expectancy under the traditional approach.
Discussion
This study presents a novel AI-based approach that helps diagnose H. pylori infection and premalignant gastric conditions using routinely captured images. Unlike traditional methods with longer turnaround times, it delivers results within minutes. Validated in both well-resourced and underserved settings, it offers a practical tool to support frontline physicians by bridging expertise gaps and geographic barriers.
Previous AI-based studies have shown promising results, with a pooled accuracy of 80%–96% for diagnosing active H. pylori infection and 90%–96% for diagnosing premalignant gastric conditions [7] [8] [9] [10]. Previous studies mainly used highly selected images from well-resourced settings and focused on the proof-of-concept stage. In contrast, this study integrated a series of AI models in an end-to-end manner to simulate the full diagnostic process, offering several unique insights. First, blurred endoscopic images must be excluded from stepwise classification to avoid misdiagnosis. Second, due to the varied mucosal patterns of H. pylori-related gastritis and premalignant lesions across different stomach regions, images must be segmented by region to enable accurate classification. Third, image enhancement is necessary in order to reveal mucosal details, enabling reliable differentiation between active infection and premalignant conditions. Fourth, a per-patient assessment based on a voting strategy efficiently correlates with gastric cancer risk and is practical for first-line application.
Virtual chromoendoscopy staging scores, such as the Endoscopic Grading of Gastric Intestinal Metaplasia (EGGIM) [28], which assesses intestinal metaplasia using narrow-band imaging or linked color imaging, can enhance interpretation and have been shown to be applicable with AI [29]. However, routine application of EGGIM may be challenging for less experienced endoscopists, particularly among general practitioners outside specialized centers who primarily focus on detecting organic lesions in the esophagus, stomach, and duodenum based on clinical symptoms. Although the rapid urease test can accurately detect H. pylori when infection is suspected, some cases may still be missed because evaluation of normal-appearing gastric mucosa is often overlooked in routine practice. AI has the potential to extract additional information from standard endoscopic images and prompt appropriate clinical management.
This study’s strength is its efficient end-to-end approach, from meticulous data collection through model development to real-world evaluation. Generalizability was shown through training, validation, and testing, and implementation in both a large medical center and a rural hospital. However, several limitations should be acknowledged. First, H. pylori assessment and histological grading, tasks requiring the identification of subtle mucosal and vascular patterns, were sensitive to image resolution. Although self-attention-based image enhancement improved interpretability, further robustness could be achieved using higher-resolution images and optimized lighting conditions when the images are taken. Virtual chromoendoscopy-based AI interpretation is a valuable endeavor but requires a higher level of expert annotation and sufficient training data [29]. Second, images not meeting quality standards were excluded, which may render some cases uninterpretable. However, this limitation underscores the model’s potential to drive quality improvement in upper endoscopic imaging. Third, the AI system provided histological predictions for all antrum and body images, far exceeding the limited sampling in the OLGA and OLGIM biopsy protocols [24] [25]. While AI-predicted and observed prevalence rates for H. pylori infection and premalignant conditions were similar during the implementation phase, the accuracy for detecting intestinal metaplasia was lower than that in the development phase, where AI models were trained on selected images (akin to targeted biopsies). In contrast, unselected images were directly input into the system during implementation. The reduced accuracy was attributed to the complexity of the multistep pipeline, which may increase the risk of misclassification; incorporating an interactive module could help identify and correct erroneous cases. It was also related to the patchy distribution of intestinal metaplasia, unlike the more diffuse changes seen in atrophic gastritis and H. pylori infection, particularly when biopsies were taken from normal-appearing mucosae under a standardized protocol. Fourth, the screening program followed organized screening principles; thus, only a subset of individuals with positive noninvasive tests were eligible for endoscopy. The favorable cost-effectiveness results relied on assumptions regarding the magnitude of gastric cancer prevention and early gastric cancer detection. Continued cohort follow-up over a longer period is needed to further support this hypothesis. Fifth, real-time AI analyses, similarly to polyp detection in colonoscopy, hold promise for enhancing diagnostic accuracy during upper endoscopy but are still in early development and mainly focused on organic lesion detection [30] [31] [32] [33]. As shown in the image heatmaps, diffuse mucosal changes and the patchy nature of premalignant lesions complicated real-time interpretation and increased the workload of semantic segmentation. A global assessment based on routinely archived images may offer a simpler and more efficient alternative. While the research advances early screening capabilities, challenges remain in interpreting AI-generated results. Since current clinical guidelines are based on the diagnostic expertise of human physicians developed over years of experience, it is crucial to evaluate whether AI outputs align with these standards, which may require the development of new follow-up and treatment plans. Sixth, the low proportion and case numbers limited the ability to categorize lesions using the original OLGA/OLGIM system, in which advanced stages requiring surveillance are primarily defined as stage III–IV, according to MAPS III [34] and American College of Gastroenterology [35] guidelines. Nonetheless, in the interpretability analyses, this dichotomization remained consistent with established risk levels for premalignant conditions, particularly intestinal metaplasia.
In conclusion, this study demonstrates that step-by-step AI models can automatically extract valuable insights from routinely captured upper endoscopy images to evaluate H. pylori infection and premalignant gastric conditions, providing a novel approach to extending AI technologies to rural areas and reducing disparities in stomach health management.
Contributorsʼ Statement
Tsung-Hsien Chiang: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Yen-Ning Hsu: Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft. Min-Han Chen: Data curation, Formal analysis, Methodology, Resources, Software, Validation, Visualization. Yi-Ru Chen: Data curation, Project administration, Resources, Visualization. Hsui-Chi Cheng: Conceptualization, Investigation, Writing - review & editing. Mei-Jin Chen: Data curation, Project administration, Resources, Supervision. Fu-Jen Lee: Data curation, Investigation, Validation, Writing - review & editing. Chi-Yang Chang: Data curation, Investigation, Resources, Writing - review & editing. Chun-Chao Chang: Data curation, Project administration, Resources, Supervision. Ming-Jong Bair: Data curation, Investigation, Resources, Supervision. Jyh-Ming Liou: Data curation, Investigation, Resources, Writing - review & editing. Chiuan-Jung Chen: Conceptualization, Data curation, Investigation, Methodology, Project administration, Resources, Software, Writing - review & editing. Yen-Chung Chen: Data curation, Investigation, Validation, Writing - review & editing. Hung Chiang: Data curation, Investigation, Validation, Writing - review & editing. Chia-Tung Shun: Data curation, Investigation, Validation, Writing - review & editing. Jui-Hsuan Liu: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - review & editing. Han-Mo Chiu: Data curation, Investigation, Resources, Writing - review & editing. Ming-Shiang Wu: Conceptualization, Investigation, Supervision, Writing - review & editing. Jiun-Yu Yu: Conceptualization, Investigation, Supervision, Writing - review & editing. Ruey-Shan Guo: Conceptualization, Investigation, Supervision, Writing - review & editing. Jaw-Town Lin: Conceptualization, Investigation, Supervision, Writing - review & editing. Yi-Chia Lee: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Chu-Song Chen: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing - review & editing.
Conflict of Interest
The authors declare that they have no conflict of interest.
Acknowledgement
The authors would like to extend their special thanks to the staff of the National Taiwan University Hospital – Integrative Medical Database Center for providing the data and to the National Taiwan University Hospital – Center of Intelligent Healthcare for their technological support. Additionally, the authors are grateful for the administrative support from the staff at the Health Bureau of Lienchiang County, Matsu Islands, Taiwan. Figure 1 and the Graphical Abstract were created by the National Taiwan University Hospital Medical Visualization Center (NTUH MedVis). Other figures and tables were created by the authors.
-
References
- 1 Ferlay J, Ervik M, Lam F. et al. Global Cancer Observatory: Cancer Today. Lyon, France: International Agency for Research on Cancer; 2024. Accessed December 31, 2024 at: https://gco.iarc.who.int/today
- 2 Ford AC, Yuan Y, Park JY. et al. Eradication therapy to prevent gastric cancer in H. pylori-positive individuals: systematic review and meta-analysis of randomized controlled trials and observational studies. Gastroenterology 2025; 169: 261-276
- 3 Park JY. Population-based Helicobacter pylori screen-and-treat strategies for gastric cancer prevention: guidance on implementation. Lyon, France: International Agency for Research on Cancer; 2025. Accessed April 28, 2025 at: https://publications.iarc.who.int/648
- 4 Lee YC, Chiang TH, Chiu HM. et al. Collaborators of Taiwan Community-based Integrated Screening Group. Screening for Helicobacter pylori to prevent gastric cancer: a pragmatic randomized clinical trial. JAMA 2024; 332: 1642-1651
- 5 Dinis-Ribeiro M, Shah S, El-Serag H. et al. The road to a world-unified approach to the management of patients with gastric intestinal metaplasia: a review of current guidelines. Gut 2024; 73: 1607–1617. Erratum in: Gut 2024; 73: e1
- 6 Huang RJ, Laszkowska M, In H. et al. Controlling gastric cancer in a world of heterogeneous risk. Gastroenterology 2023; 164: 736-751
- 7 Dilaghi E, Lahner E, Annibale B. et al. Systematic review and meta-analysis: artificial intelligence for the diagnosis of gastric precancerous lesions and Helicobacter pylori infection. Dig Liver Dis 2022; 54: 1630-1638
- 8 Shi Y, Wei N, Wang K. et al. Diagnostic value of artificial intelligence-assisted endoscopy for chronic atrophic gastritis: a systematic review and meta-analysis. Front Med (Lausanne) 2023; 10: 1134980
- 9 Li N, Yang J, Li X. et al. Accuracy of artificial intelligence-assisted endoscopy in the diagnosis of gastric intestinal metaplasia: a systematic review and meta-analysis. PLoS One 2024; 19: e0303421
- 10 Jiang Y, Yan H, Cui J. et al. Artificial intelligence in endoscopy for predicting Helicobacter pylori infection: a systematic review and meta-analysis. Helicobacter 2025; 30: e70026
- 11 Lee YC, Chen TH, Chiu HM. et al. The benefit of mass eradication of Helicobacter pylori infection: a community-based study of gastric cancer prevention. Gut 2013; 62: 676-682
- 12 Chiang TH, Chang WJ, Chen SL. et al. Mass eradication of Helicobacter pylori to reduce gastric cancer incidence and mortality: a long-term cohort study on Matsu Islands. Gut 2021; 70: 243-250
- 13 Lei WY, Lee JY, Chuang SL. et al. Eradicating Helicobacter pylori via 13C-urea breath screening to prevent gastric cancer in indigenous communities: a population-based study and development of a family index-case method. Gut 2023; 72: 2231-2240
- 14 Lee YC. Population-based Helicobacter pylori screen-and-treat strategy to prevent gastric cancer in the Matsu Islands. In: Park JY. , ed. Population-based Helicobacter pylori screen-and-treat strategies for gastric cancer prevention: guidance on implementation (IARC Working Group Reports No. 12). Lyon, France: International Agency for Research on Cancer; 2025. Accessed April 28, 2025 at: https://publications.iarc.who.int/648
- 15 Chiang TH, Maeda M, Yamada H. et al. Risk stratification for gastric cancer after Helicobacter pylori eradication: a population-based study on Matsu Islands. J Gastroenterol Hepatol 2021; 36: 671-679
- 16 Antonelli G, Libanio D, De Groof AJ. et al. QUAIDE – quality assessment of AI preclinical studies in diagnostic endoscopy. Gut 2024; 74: 153-161
- 17 Lee YC, Chao YT, Lin PJ. et al. Quality assurance of integrative big data for medical research within a multihospital system. J Formos Med Assoc 2022; 121: 1728-1738
- 18 Veitch AM, Uedo N, Yao K. et al. Optimizing early upper gastrointestinal cancer detection at endoscopy. Nat Rev Gastroenterol Hepatol 2015; 12: 660-667
- 19 Selvaraju RR, Cogswell M, Das A. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29. Venice, Italy:
- 20 Sugimoto M, Murata M, Murakami K. et al. Characteristic endoscopic findings in Helicobacter pylori diagnosis in clinical practice. Expert Rev Gastroenterol Hepatol 2024; 18: 457-472
- 21 Liou JM, Chen CC, Chen MJ. et al. Taiwan Helicobacter Consortium. Sequential versus triple therapy for the first-line treatment of Helicobacter pylori: a multicentre, open-label, randomised trial. Lancet 2013; 381: 205-213
- 22 Liou JM, Fang YJ, Chen CC. et al. Taiwan Gastrointestinal Disease and Helicobacter Consortium. Concomitant, bismuth quadruple, and 14-day triple therapy in the first-line treatment of Helicobacter pylori: a multicentre, open-label, randomised trial. Lancet 2016; 388: 2355-2365
- 23 Dixon MF, Genta RM, Yardley JH. et al. Classification and grading of gastritis. The updated Sydney System. International Workshop on the Histopathology of Gastritis, Houston 1994. Am J Surg Pathol 1996; 20: 1161-1181
- 24 Rugge M, Meggio A, Pennelli G. et al. Gastritis staging in clinical practice: the OLGA staging system. Gut 2007; 56: 631-636
- 25 Capelle LG, de Vries AC, Haringsma J. et al. The staging of gastritis with the OLGA system by using intestinal metaplasia as an accurate alternative for atrophic gastritis. Gastrointest Endosc 2010; 71: 1150-1158
- 26 Zuiderveld K. Contrast limited adaptive histogram equalization. In: Heckbert PS. , ed. Graphics Gems IV. San Diego, CA: Academic Press; 1994: 474-485
- 27 Lundberg S, Lee SI. A unified approach to interpreting model predictions. arXiv 2017;
- 28 Esposito G, Pimentel-Nunes P, Angeletti S. et al. Endoscopic grading of gastric intestinal metaplasia (EGGIM): a multicenter validation study. Endoscopy 2019; 51: 515-521
- 29 Almeida E, Martins ML, Marques D. et al. Artificial intelligence for endoscopic grading of gastric intestinal metaplasia: advancing risk stratification for gastric cancer. Endoscopy 2025;
- 30 Wu L, He X, Liu M. et al. Evaluation of the effects of an artificial intelligence system on endoscopy quality and preliminary testing of its performance in detecting early gastric cancer: a randomized controlled trial. Endoscopy 2021; 53: 1199-1207
- 31 Siripoppohn V, Pittayanon R, Tiankanon K. et al. Real-time semantic segmentation of gastric intestinal metaplasia using a deep learning approach. Clin Endosc 2022; 55: 390-400
- 32 Pornvoraphat P, Tiankanon K, Pittayanon R. et al. Real-time gastric intestinal metaplasia diagnosis tailored for bias and noisy-labeled data with multiple endoscopic imaging. Comput Biol Med 2023; 154: 106582
- 33 Gong EJ, Bang CS, Lee JJ. et al. Deep learning-based clinical decision support system for gastric neoplasms in real-time endoscopy: development and validation study. Endoscopy 2023; 55: 701-708
- 34 Dinis-Ribeiro M, Libânio D, Uchima H. et al. Management of epithelial precancerous conditions and early neoplasia of the stomach (MAPS III): European Society of Gastrointestinal Endoscopy (ESGE), European Helicobacter and Microbiota Study Group (EHMSG) and European Society of Pathology (ESP) Guideline update 2025. Endoscopy 2025; 57: 504-554
- 35 Morgan DR, Corral JE, Li D. et al. ACG Clinical Guideline: Diagnosis and management of gastric premalignant conditions. Am J Gastroenterol 2025; 120: 709-737
Correspondence
Publication History
Received: 23 May 2025
Accepted after revision: 12 October 2025
Accepted Manuscript online:
13 October 2025
Article published online:
26 November 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Ferlay J, Ervik M, Lam F. et al. Global Cancer Observatory: Cancer Today. Lyon, France: International Agency for Research on Cancer; 2024. Accessed December 31, 2024 at: https://gco.iarc.who.int/today
- 2 Ford AC, Yuan Y, Park JY. et al. Eradication therapy to prevent gastric cancer in H. pylori-positive individuals: systematic review and meta-analysis of randomized controlled trials and observational studies. Gastroenterology 2025; 169: 261-276
- 3 Park JY. Population-based Helicobacter pylori screen-and-treat strategies for gastric cancer prevention: guidance on implementation. Lyon, France: International Agency for Research on Cancer; 2025. Accessed April 28, 2025 at: https://publications.iarc.who.int/648
- 4 Lee YC, Chiang TH, Chiu HM. et al. Collaborators of Taiwan Community-based Integrated Screening Group. Screening for Helicobacter pylori to prevent gastric cancer: a pragmatic randomized clinical trial. JAMA 2024; 332: 1642-1651
- 5 Dinis-Ribeiro M, Shah S, El-Serag H. et al. The road to a world-unified approach to the management of patients with gastric intestinal metaplasia: a review of current guidelines. Gut 2024; 73: 1607–1617. Erratum in: Gut 2024; 73: e1
- 6 Huang RJ, Laszkowska M, In H. et al. Controlling gastric cancer in a world of heterogeneous risk. Gastroenterology 2023; 164: 736-751
- 7 Dilaghi E, Lahner E, Annibale B. et al. Systematic review and meta-analysis: artificial intelligence for the diagnosis of gastric precancerous lesions and Helicobacter pylori infection. Dig Liver Dis 2022; 54: 1630-1638
- 8 Shi Y, Wei N, Wang K. et al. Diagnostic value of artificial intelligence-assisted endoscopy for chronic atrophic gastritis: a systematic review and meta-analysis. Front Med (Lausanne) 2023; 10: 1134980
- 9 Li N, Yang J, Li X. et al. Accuracy of artificial intelligence-assisted endoscopy in the diagnosis of gastric intestinal metaplasia: a systematic review and meta-analysis. PLoS One 2024; 19: e0303421
- 10 Jiang Y, Yan H, Cui J. et al. Artificial intelligence in endoscopy for predicting Helicobacter pylori infection: a systematic review and meta-analysis. Helicobacter 2025; 30: e70026
- 11 Lee YC, Chen TH, Chiu HM. et al. The benefit of mass eradication of Helicobacter pylori infection: a community-based study of gastric cancer prevention. Gut 2013; 62: 676-682
- 12 Chiang TH, Chang WJ, Chen SL. et al. Mass eradication of Helicobacter pylori to reduce gastric cancer incidence and mortality: a long-term cohort study on Matsu Islands. Gut 2021; 70: 243-250
- 13 Lei WY, Lee JY, Chuang SL. et al. Eradicating Helicobacter pylori via 13C-urea breath screening to prevent gastric cancer in indigenous communities: a population-based study and development of a family index-case method. Gut 2023; 72: 2231-2240
- 14 Lee YC. Population-based Helicobacter pylori screen-and-treat strategy to prevent gastric cancer in the Matsu Islands. In: Park JY. , ed. Population-based Helicobacter pylori screen-and-treat strategies for gastric cancer prevention: guidance on implementation (IARC Working Group Reports No. 12). Lyon, France: International Agency for Research on Cancer; 2025. Accessed April 28, 2025 at: https://publications.iarc.who.int/648
- 15 Chiang TH, Maeda M, Yamada H. et al. Risk stratification for gastric cancer after Helicobacter pylori eradication: a population-based study on Matsu Islands. J Gastroenterol Hepatol 2021; 36: 671-679
- 16 Antonelli G, Libanio D, De Groof AJ. et al. QUAIDE – quality assessment of AI preclinical studies in diagnostic endoscopy. Gut 2024; 74: 153-161
- 17 Lee YC, Chao YT, Lin PJ. et al. Quality assurance of integrative big data for medical research within a multihospital system. J Formos Med Assoc 2022; 121: 1728-1738
- 18 Veitch AM, Uedo N, Yao K. et al. Optimizing early upper gastrointestinal cancer detection at endoscopy. Nat Rev Gastroenterol Hepatol 2015; 12: 660-667
- 19 Selvaraju RR, Cogswell M, Das A. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29. Venice, Italy:
- 20 Sugimoto M, Murata M, Murakami K. et al. Characteristic endoscopic findings in Helicobacter pylori diagnosis in clinical practice. Expert Rev Gastroenterol Hepatol 2024; 18: 457-472
- 21 Liou JM, Chen CC, Chen MJ. et al. Taiwan Helicobacter Consortium. Sequential versus triple therapy for the first-line treatment of Helicobacter pylori: a multicentre, open-label, randomised trial. Lancet 2013; 381: 205-213
- 22 Liou JM, Fang YJ, Chen CC. et al. Taiwan Gastrointestinal Disease and Helicobacter Consortium. Concomitant, bismuth quadruple, and 14-day triple therapy in the first-line treatment of Helicobacter pylori: a multicentre, open-label, randomised trial. Lancet 2016; 388: 2355-2365
- 23 Dixon MF, Genta RM, Yardley JH. et al. Classification and grading of gastritis. The updated Sydney System. International Workshop on the Histopathology of Gastritis, Houston 1994. Am J Surg Pathol 1996; 20: 1161-1181
- 24 Rugge M, Meggio A, Pennelli G. et al. Gastritis staging in clinical practice: the OLGA staging system. Gut 2007; 56: 631-636
- 25 Capelle LG, de Vries AC, Haringsma J. et al. The staging of gastritis with the OLGA system by using intestinal metaplasia as an accurate alternative for atrophic gastritis. Gastrointest Endosc 2010; 71: 1150-1158
- 26 Zuiderveld K. Contrast limited adaptive histogram equalization. In: Heckbert PS. , ed. Graphics Gems IV. San Diego, CA: Academic Press; 1994: 474-485
- 27 Lundberg S, Lee SI. A unified approach to interpreting model predictions. arXiv 2017;
- 28 Esposito G, Pimentel-Nunes P, Angeletti S. et al. Endoscopic grading of gastric intestinal metaplasia (EGGIM): a multicenter validation study. Endoscopy 2019; 51: 515-521
- 29 Almeida E, Martins ML, Marques D. et al. Artificial intelligence for endoscopic grading of gastric intestinal metaplasia: advancing risk stratification for gastric cancer. Endoscopy 2025;
- 30 Wu L, He X, Liu M. et al. Evaluation of the effects of an artificial intelligence system on endoscopy quality and preliminary testing of its performance in detecting early gastric cancer: a randomized controlled trial. Endoscopy 2021; 53: 1199-1207
- 31 Siripoppohn V, Pittayanon R, Tiankanon K. et al. Real-time semantic segmentation of gastric intestinal metaplasia using a deep learning approach. Clin Endosc 2022; 55: 390-400
- 32 Pornvoraphat P, Tiankanon K, Pittayanon R. et al. Real-time gastric intestinal metaplasia diagnosis tailored for bias and noisy-labeled data with multiple endoscopic imaging. Comput Biol Med 2023; 154: 106582
- 33 Gong EJ, Bang CS, Lee JJ. et al. Deep learning-based clinical decision support system for gastric neoplasms in real-time endoscopy: development and validation study. Endoscopy 2023; 55: 701-708
- 34 Dinis-Ribeiro M, Libânio D, Uchima H. et al. Management of epithelial precancerous conditions and early neoplasia of the stomach (MAPS III): European Society of Gastrointestinal Endoscopy (ESGE), European Helicobacter and Microbiota Study Group (EHMSG) and European Society of Pathology (ESP) Guideline update 2025. Endoscopy 2025; 57: 504-554
- 35 Morgan DR, Corral JE, Li D. et al. ACG Clinical Guideline: Diagnosis and management of gastric premalignant conditions. Am J Gastroenterol 2025; 120: 709-737








