Subscribe to RSS
DOI: 10.1055/a-2780-0974
Development and validation of a multimodal deep learning model for early esophageal squamous neoplasia detection and invasion depth prediction
Authors
Supported by: China University Industry-Research Innovation Fund - Huaton Medical Research Special Program 2023HT064
Supported by: Grants from ShanghaiMunicipal Health Commission Fund 202240357
Clinical Trial:
Registration number (trial ID): NCT06412419, Trial registry: Clinical Trials Registry India (http://www.ctri.nic.in/Clinicaltrials), Type of Study: Prospective and Retrospective Multicenter Study

Abstract
Background
Early detection of esophageal squamous cell carcinoma (ESCC) is critical for optimizing patient outcomes. Magnifying endoscopy and endoscopic ultrasonography (EUS) serve as established diagnostic modalities. The multimodal ultrasound and magnifying endoscopic algorithm for early ESCC diagnostics (MUMA-EDx) integrates deep learning-based magnifying endoscopy and EUS imaging to improve early-stage ESCC identification and invasion depth assessment.
Methods
Model development and internal validation used a retrospective dataset; external validation used a prospective cohort. MUMA-EDx developed two TResNet_m-based classifiers (magnifying endoscopy/EUS) followed by feature-level fusion. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.
Results
MUMA-EDx was developed and validated using a retrospective dataset comprising 358 patients (18 420 images) and subsequently tested prospectively on an independent cohort of 122 patients (8711 images). The feature-level multimodal approach significantly outperformed single-modality models. For tumor discrimination, the model achieved an AUC of 0.94 (95%CI 0.92–0.96) in retrospective validation and a perfect patient-level AUC of 1.00 (95%CI 1.00–1.00) in prospective testing. For the more complex task of multiclass invasion depth classification, it achieved a retrospective AUC of 0.95 (95%CI 0.88–0.99), which remained strong at 0.80 (95%CI 0.67–0.87) in the prospective cohort. In a comparative study on invasion depth classification, MUMA-EDx's performance exceeded that of novice endoscopists and was comparable to expert-level diagnostics.
Conclusion
MUMA-EDx demonstrably delivers exceptional early ESCC detection and robust invasion depth classification, achieving performance comparable to expert endoscopists and is poised to significantly enhance diagnostic precision and patient outcomes.
Publication History
Received: 10 July 2025
Accepted after revision: 29 December 2025
Accepted Manuscript online:
30 December 2025
Article published online:
03 March 2026
© 2026. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Abnet CC, Arnold M, Wei W-Q. Epidemiology of esophageal squamous cell carcinoma. Gastroenterology 2018; 154: 360-373
- 2 Morgan E, Soerjomataram I, Rumgay H. et al. The global landscape of esophageal squamous cell carcinoma and esophageal adenocarcinoma incidence and mortality in 2020 and projections to 2040: New estimates from GLOBOCAN 2020. Gastroenterology 2022; 163: 649-658 e2
- 3 Jiao Y, Wang T, Fu L. et al. Trends, patterns, and risk factors of esophageal cancer mortality in China, 2008–2021: A National Mortality Surveillance System data analysis. J Adv Res 2025. doi: 10.1016/j.jare.2025.05.021.
- 4 Jiao Y, Cheng Z, Gao Y. et al. Development and status quo of digestive endoscopy in China: An analysis based on the national census in 2013 and 2020. J Transl Int Med 2024; 12: 177-187
- 5 Chen Z, Xu Y, Xin L. et al. Summary of the 2024 report on gastroenterology and digestive endoscopy in China. Chin Med J (Engl) 2025; 138: 2693-2701
- 6 Kitagawa Y, Uno T, Oyama T. et al. Esophageal cancer practice guidelines 2017 edited by the Japan Esophageal Society: part 1. Esophagus 2019; 16: 1-24
- 7 Zhang S, Pan H, Liu H. et al. Diagnostic efficacy of type B vessels in the Japan Esophageal Society classification for the depth of invasion of superficial esophageal squamous cell carcinoma. Ann Ital Chir 2024; 95: 621-627
- 8 Inoue T, Ishihara R, Shibata T. et al. Endoscopic imaging modalities for diagnosing the invasion depth of superficial esophageal squamous cell carcinoma: a systematic review. Esophagus 2022; 19: 375-383
- 9 Ishihara R, Matsuura N, Hanaoka N. et al. Endoscopic imaging modalities for diagnosing invasion depth of superficial esophageal squamous cell carcinoma: a systematic review and meta-analysis. BMC Gastroenterol 2017; 17: 24
- 10 Chadebecq F, Lovat LB, Stoyanov D. Artificial intelligence and automation in endoscopy and surgery. Nat Rev Gastroenterol Hepatol 2023; 20: 171-182
- 11 Ebigbo A, Mendel R, Probst A. et al. Multimodal imaging for detection and segmentation of Barrett’s esophagus-related neoplasia using artificial intelligence. Endoscopy 2022; 54: E587-E587
- 12 Meng Q-Q, Gao Y, Lin H. et al. Application of an artificial intelligence system for endoscopic diagnosis of superficial esophageal squamous cell carcinoma. World J Gastroenterol 2022; 28: 5483-5493
- 13 Horie Y, Yoshio T, Aoyama K. et al. Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc 2019; 89: 25-32
- 14 Paszke A, Gross S, Massa F. et al. PyTorch: an imperative style, high-performance deep learning library. In: Wallach HM, Larochelle H, Beygelzimer A. , ed. Proceedings of the 33rd International Conference on Neural Information Proceedings Systems; December 2019. Red Hook, NY, USA: Curran Associates; 2019: 8026-8037
- 15 Harris CR, Millman KJ, van der Walt SJ. et al. Array programming with NumPy. Nature 2020; 585: 357-362
- 16 License. OpenCV. Accessed November 11, 2025 at: https://opencv.org/license/
- 17 Buslaev A, Parinov A, Khvedchenya E. et al. Albumentations: fast and flexible image augmentations. Information 2020; 11: 125
- 18 Ju R-Y, Chien C-T, Chiang J-S. YOLOv8-ResCBAM: YOLOv8 based on an effective attention module for pediatric wrist fracture detection; 2024. Accessed January 22, 2026 at: https://doi.org/10.48550/arXiv.2409.18826
- 19 Deng J, Dong W, Socher R. et al. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009: 248–255.
- 20 Ridnik T, Lawen H, Noy A. et al. TResNet: High performance GPU-dedicated architecture; 2020. Accessed January 22, 2026 at: https://doi.org/10.48550/arXiv.2003.13630
- 21 Jiang P, Ergu D, Liu F. et al. A review of YOLO algorithm developments. Procedia Comput Sci 2022; 199: 1066-1073
- 22 Xie F, Lin B, Liu Y. Research on the coordinate attention mechanism fuse in a YOLOv5 deep learning detector for the SAR Ship Detection Task. Sensors (Basel) 2022; 22: 3370
- 23 Yu S, Xie L, Liu L. et al. Learning long-term temporal features with deep neural networks for human action recognition. IEEE Access 2020; 8: 1840-1850
- 24 Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. Accessed November 12, 2025 at: https://arxiv.org/abs/1711.05101v3
- 25 Liu Q, Zhou D, Han T. et al. A noninvasive multianalytical approach for lung cancer diagnosis of- patients with pulmonary nodules. Adv Sci (Weinh) 2021; 8: 2100104
- 26 Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983; 148: 839-843
- 27 Robin X, Turck N, Hainard A. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011; 12: 77
- 28 Doll H, Carney S. Statistical approaches to uncertainty: P values and confidence intervals unpacked. Equine Vet J 2007; 39: 275-276
- 29 Debray TPA, Collins GS, Riley RD. et al. Transparent reporting of multivariable prediction models developed or validated using clustered data: TRIPOD-Cluster checklist. BMJ 2023; 380: e071018
- 30 International Commitee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. J Pharmacol Pharmacother 2010; 1: 42-58
- 31 Li Y, Gu W, Yue H. et al. Real-time detection of laryngopharyngeal cancer using an artificial intelligence-assisted system with multimodal data. J Transl Med 2023; 21: 698
- 32 Yuan X-L, Zeng X-H, Liu W. et al. Artificial intelligence for detecting and delineating the extent of superficial esophageal squamous cell carcinoma and precancerous lesions under narrow-band imaging (with video). Gastrointest Endosc 2023; 97: 664-672 e4
- 33 Ohmori M, Ishihara R, Aoyama K. et al. Endoscopic detection and differentiation of esophageal lesions using a deep neural network. Gastrointest Endosc 2020; 91: 301-309 e1
- 34 Zhao Y-Y, Xue D-X, Wang Y-L. et al. Computer-assisted diagnosis of early esophageal squamous cell carcinoma using narrow-band imaging magnifying endoscopy. Endoscopy 2019; 51: 333-341
- 35 Wang J, Long Q, Liang Y. et al. AI-assisted identification of intrapapillary capillary loops in magnification endoscopy for diagnosing early-stage esophageal squamous cell carcinoma: a preliminary study. Med Biol Eng Comput 2023; 61: 1631-1648
- 36 Kitagawa Y, Uno T, Oyama T. et al. Esophageal cancer practice guidelines 2017 edited by the Japan esophageal society: part 2. Esophagus 2019; 16: 25-43
- 37 Pimentel-Nunes P, Libânio D, Bastiaansen BAJ. et al. Endoscopic submucosal dissection for superficial gastrointestinal lesions: European Society of Gastrointestinal Endoscopy (ESGE) Guideline – Update 2022. Endoscopy 2022; 54: 591-622
- 38 Chawla NV, Bowyer KW, Hall LO. et al. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 2002; 16: 321-357
- 39 Zhu X, Song B, Shi F. et al. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. Med Image Anal 2021; 67: 101824
