RSS-Feed abonnieren
DOI: 10.1055/a-2722-3871
The Pediatric Surgeon's AI Toolbox: How Large Language Models Like ChatGPT Are Simplifying Practice and Expanding Global Access
Authors

Abstract
Introduction
Pediatric surgeons face substantial administrative workload. Large language models (LLMs) may streamline documentation, family communication, rapid reference, and education, but raise concerns about accuracy, bias, and privacy. This review summarizes practical, near-term uses with clinician oversight.
Materials and Methods
Narrative review of LLMs in pediatric surgical workflows and scholarly writing. Sources included MEDLINE/PubMed, Scopus, Embase, Google Scholar, and policy documents (WHO, FDA, EU). Searches spanned January 2015 to August 2025, English only. Peer-reviewed and multicenter studies were prioritized; selected high-signal preprints were labeled. Data screening and extraction were performed by the author; findings were synthesized qualitatively.
Results
Across studies, LLMs reduced drafting time for discharge letters and operative note registries while maintaining clinician-rated quality; they improved readability of consent forms and postoperative instructions and supported patient education. For decision support, general models performed well on structured medical questions, with stronger results when grounded by retrieval. Common limits included coding performance, case-nuance/temporal reasoning, variable translation outside high-resource languages, and citation fabrication without curated sources. Privacy risks stemmed from logging, rare-string memorization, and poorly scoped tool connections. Recommended controls included a clinician-in-the-loop “review and release” workflow, privacy-preserving deployments, version pinning, and ongoing monitoring aligned with early-evaluation guidance.
Conclusion
When outputs are grounded in structured EHR data or curated retrieval and briefly reviewed by clinicians, LLMs can responsibly reduce administrative burden and support communication and education. Early adoption should target high-volume, low-risk, auditable tasks. Future priorities must include multicenter pediatric datasets, transparent benchmarks (accuracy, calibration, equity, time saved), and prospective studies linked to safety outcomes.
Keywords
pediatric surgery - large language models - clinical documentation - artificial intelligence - decision supportPublikationsverlauf
Eingereicht: 24. September 2025
Angenommen: 13. Oktober 2025
Accepted Manuscript online:
14. Oktober 2025
Artikel online veröffentlicht:
03. November 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1
Bouchard ME,
Tian Y,
Justiniano J.
et al.
A critical threshold for global pediatric surgical workforce density. Pediatr Surg
Int 2021; 37 (09) 1303-1309
Reference Ris Wihthout Link
- 2
Thirunavukarasu AJ,
Ting DSJ,
Elangovan K,
Gutierrez L,
Tan TF,
Ting DSW.
Large language models in medicine. Nat Med 2023; 29 (08) 1930-1940
Reference Ris Wihthout Link
- 3
Fahrner LJ,
Chen E,
Topol E,
Rajpurkar P.
The generative era of medical AI. Cell 2025; 188 (14) 3648-3660
Reference Ris Wihthout Link
- 4
Xiao D,
Meyers P,
Upperman JS,
Robinson JR.
Revolutionizing healthcare with ChatGPT: an early exploration of an AI language model's
impact on medicine at large and its role in pediatric surgery. J Pediatr Surg 2023;
58 (12) 2410-2415
Reference Ris Wihthout Link
- 5
González R,
Poenaru D,
Woo R.
et al;
Pediatric Surgery ChatGPT Collaborative Group.
ChatGPT: what every pediatric surgeon should know about its potential uses and pitfalls.
J Pediatr Surg 2024; 59 (05) 941-947
Reference Ris Wihthout Link
- 6
Williams CYK,
Subramanian CR,
Ali SS.
et al.
Physician- and large language model-generated hospital discharge summaries. JAMA Intern
Med 2025; 185 (07) 818-825
Reference Ris Wihthout Link
- 7
Ganzinger M,
Kunz N,
Fuchs P.
et al.
Automated generation of discharge summaries: leveraging large language models with
clinical data. Sci Rep 2025; 15 (01) 16466
Reference Ris Wihthout Link
- 8
Heilmeyer F,
Böhringer D,
Reinhard T,
Arens S,
Lyssenko L,
Haverkamp C.
Viability of open large language models for clinical documentation in German health
care: real-world model evaluation study. JMIR Med Inform 2024; 12: e59617
Reference Ris Wihthout Link
- 9
Balch JA,
Desaraju SS,
Nolan VJ.
et al.
Language models for multilabel document classification of surgical concepts in exploratory
laparotomy operative notes: algorithm development study. JMIR Med Inform 2025; 13:
e71176
Reference Ris Wihthout Link
- 10
Soroush A,
Glicksberg BS,
Zimlichman E.
et al.
Large language models are poor medical coders—benchmarking of medical code querying.
NEJM AI 2024; 1 (05) 2300040
Reference Ris Wihthout Link
- 11
Liu TL,
Hetherington TC,
Dharod A.
et al.
Does AI-powered clinical documentation enhance clinician efficiency? A longitudinal
study. NEJM AI 2024; 1 (12) 2400659
Reference Ris Wihthout Link
- 12
Decker H,
Trang K,
Ramirez J.
et al.
Large language model-based chatbot vs surgeon-generated informed consent documentation
for common procedures. JAMA Netw Open 2023; 6 (10) e2336997-e2336997
Reference Ris Wihthout Link
- 13
Azevedo CB,
Martinho AS,
Braga I,
Nogueira-Silva C,
Barroso C,
Correia-Pinto J.
ChatGPT-4o in enhancing informed consent in pediatric surgical practice. J Pediatr
Surg 2025; 60 (09) 162413
Reference Ris Wihthout Link
- 14
Wan P,
Huang Z,
Tang W.
et al.
Outpatient reception via collaboration between nurses and a large language model:
a randomized controlled trial. Nat Med 2024; 30 (10) 2878-2885
Reference Ris Wihthout Link
- 15
Aghamaliyev U,
Karimbayli J,
Zamparas A.
et al.
Bots in white coats: are large language models the future of patient education? A
multicenter cross-sectional analysis. Int J Surg 2025; 111 (03) 2376-2384
Reference Ris Wihthout Link
- 16
Benboujja F,
Hartnick E,
Zablah E.
et al.
Overcoming language barriers in pediatric care: a multilingual, AI-driven curriculum
for global healthcare education. Front Public Health 2024; 12: 1337395
Reference Ris Wihthout Link
- 17
McDuff D,
Schaekermann M,
Tu T.
et al.
Towards accurate differential diagnosis with large language models. Nature 2025; 642
(8067) 451-457
Reference Ris Wihthout Link
- 18
Singhal K,
Tu T,
Gottweis J.
et al.
Toward expert-level medical question answering with large language models. Nat Med
2025; 31 (03) 943-950
Reference Ris Wihthout Link
- 19
Nori H,
King N,
McKinney SM,
Carignan D,
Horvitz E.
Capabilities of GPT-4 on Medical Challenge Problems. Published online 2023. Accessed
at: https://arxiv.org/abs/2303.13375 (preprint)
Reference Ris Wihthout Link
- 20
Nori H,
Daswani M,
Kelly C.
et al.
Sequential diagnosis with language models. Published online 2025. Accessed at: https://arxiv.org/abs/2506.22405 (preprint)
Reference Ris Wihthout Link
- 21
Barile J,
Margolis A,
Cason G.
et al.
Diagnostic accuracy of a large language model in pediatric case studies. JAMA Pediatr
2024; 178 (03) 313-315
Reference Ris Wihthout Link
- 22
Kim J,
Podlasek A,
Shidara K,
Liu F,
Alaa A,
Bernardo D.
Limitations of large language models in clinical problem-solving arising from inflexible
reasoning. Published online 2025. Accessed at: https://arxiv.org/abs/2502.04381 (preprint)
Reference Ris Wihthout Link
- 23
Ong CS,
Obey NT,
Zheng Y,
Cohan A,
Schneider EB.
SurgeryLLM: a retrieval-augmented generation large language model framework for surgical
decision support and workflow enhancement. NPJ Digit Med 2024; 7 (01) 364
Reference Ris Wihthout Link
- 24
Speer JE,
Parker SM,
Williams BL.
Interactive learning with ChatGPT: hands-on practice and real-time feedback in health
sciences education for SMART goal writing. medRxiv . Published online January 1, 2024:
2024.06.11.24308786 (preprint)
Reference Ris Wihthout Link
- 25
Safranek CW,
Sidamon-Eristoff AE,
Gilson A,
Chartash D.
The role of large language models in medical education: applications and implications.
JMIR Med Educ 2023; 9: e50945
Reference Ris Wihthout Link
- 26
Wu J,
Liang X,
Bai X,
Chen Z.
SurgBox: agent-driven operating room sandbox with surgery copilot. Published online
2024. Accessed at: https://arxiv.org/abs/2412.05187 (preprint)
Reference Ris Wihthout Link
- 27
Zhui L,
Yhap N,
Liping L.
et al.
Impact of large language models on medical education and teaching adaptations. JMIR
Med Inform 2024; 12: e55933
Reference Ris Wihthout Link
- 28
Bernard N,
Sagawa Jr Y,
Bier N,
Lihoreau T,
Pazart L,
Tannou T.
Using artificial intelligence for systematic review: the example of elicit. BMC Med
Res Methodol 2025; 25 (01) 75
Reference Ris Wihthout Link
- 29
Guo E,
Gupta M,
Deng J,
Park YJ,
Paget M,
Naugler C.
Automated paper screening for clinical reviews using large language models: data analysis
study. J Med Internet Res 2024; 26: e48996
Reference Ris Wihthout Link
- 30
Adam GP,
DeYoung J,
Paul A.
et al.
Literature search sandbox: a large language model that generates search queries for
systematic reviews. JAMIA Open 2024; 7 (03) ooae098
Reference Ris Wihthout Link
- 31
Xiong G,
Jin Q,
Lu Z,
Zhang A.
Benchmarking retrieval-augmented generation for medicine. Published online 2024. Accessed
at: https://arxiv.org/abs/2402.13178 (preprint)
Reference Ris Wihthout Link
- 32
Li Y,
Zhao J,
Li M.
et al.
RefAI: a GPT-powered retrieval-augmented generative tool for biomedical literature
recommendation and summarization. J Am Med Inform Assoc 2024; 31 (09) 2030-2039
Reference Ris Wihthout Link
- 33
Holland AM,
Lorenz WR,
Cavanagh JC.
et al.
Comparison of medical research abstracts written by surgical trainees and senior surgeons
or generated by large language models. JAMA Netw Open 2024; 7 (08) e2425373
Reference Ris Wihthout Link
- 34
Gao CA,
Howard FM,
Markov NS.
et al.
Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors
and blinded human reviewers. NPJ Digit Med 2023; 6 (01) 75
Reference Ris Wihthout Link
- 35
Li ZQ,
Xu HL,
Cao HJ,
Liu ZL,
Fei YT,
Liu JP.
Use of artificial intelligence in peer review among top 100 medical journals. JAMA
Netw Open 2024; 7 (12) e2448609
Reference Ris Wihthout Link
- 36
Jobeir B,
Alahdal A,
Saner F,
Staubli S,
Broering D,
Raptis D.
A new frontier in biostatistics: evaluating the accuracy of ChatGPT-4 vs. R in analysing
liver resection data. J Glob Health Econ Policy 2024; 4: e2024005
Reference Ris Wihthout Link
- 37
Ruta MR,
Gaidici T,
Irwin C,
Lifshitz J.
ChatGPT for univariate statistics: validation of AI-assisted data analysis in healthcare
research. J Med Internet Res 2025; 27 (01) e63550
Reference Ris Wihthout Link
- 38
Ignjatović A,
Stevanović L.
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical
education in Serbia: a descriptive study. J Educ Eval Health Prof 2023; 20: 28
Reference Ris Wihthout Link
- 39
Walters WH,
Wilder EI.
Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep
2023; 13 (01) 14045
Reference Ris Wihthout Link
- 40
Gotoman J,
Luna H,
Sangria J,
Santiago Jr CS.
Barbuco DD.
Accuracy and reliability of AI-generated text detection tools: a literature review.
Am J IR 40 Beyond 2025; 4 (01) 1-9
Reference Ris Wihthout Link
- 41
Bakker C,
Theis-Mahon N,
Brown SJ.
Evaluating the accuracy of scite, a smart citation index. Hypothesis Res J Health
Inf Prof 2023 35. 02
Reference Ris Wihthout Link
- 42
Emma P,
Divya S,
Rajiv M.
et al.
Using large language models to promote health equity. NEJM AI 2025; 2 (02) AIp2400889
Reference Ris Wihthout Link
- 43
Ray M,
Kats DJ,
Moorkens J.
et al.
Evaluating a large language model in translating patient instructions to Spanish using
a standardized framework. JAMA Pediatr 2025; 179 (09) 1026-1033
Reference Ris Wihthout Link
- 44
Brewster RCL,
Gonzalez P,
Khazanchi R.
et al.
Performance of ChatGPT and Google Translate for pediatric discharge instruction translation.
Pediatrics 2024; 154 (01) e2023065573
Reference Ris Wihthout Link
- 45
Kong M,
Fernandez A,
Bains J.
et al.
Evaluation of the accuracy and safety of machine translation of patient-specific discharge
instructions: a comparative analysis. BMJ Qual Saf 2025; :bmjqs-2024- 018384 . Epub
ahead of print.
Reference Ris Wihthout Link
- 46
Qiu P,
Wu C,
Zhang X.
et al.
Towards building multilingual language model for medicine. Nat Commun 2024; 15 (01)
8384
Reference Ris Wihthout Link
- 47
Rodler S,
Cei F,
Ganjavi C.
et al;
YAU Collaborators.
GPT-4 generates accurate and readable patient education materials aligned with current
oncological guidelines: a randomized assessment. PLoS One 2025; 20 (06) e0324175
Reference Ris Wihthout Link
- 48
Amano T,
Ramírez-Castañeda V,
Berdejo-Espinola V.
et al.
The manifold costs of being a non-native English speaker in science. PLoS Biol 2023;
21 (07) e3002184
Reference Ris Wihthout Link
- 49
Khalifa M,
Albadawy M.
Using artificial intelligence in academic writing and research: An essential productivity
tool. Comput Methods Programs Biomed Update 2024; 5: 100145
Reference Ris Wihthout Link
- 50
Li J,
Zong H,
Wu E.
et al.
Exploring the potential of artificial intelligence to enhance the writing of English
academic papers by non-native English-speaking medical students—the educational application
of ChatGPT. BMC Med Educ 2024; 24 (01) 736
Reference Ris Wihthout Link
- 51
Bai Y,
Kosonocky CW,
Wang JZ.
How our authors are using AI tools in manuscript writing. Patterns (N Y) 2024; 5 (10)
101075
Reference Ris Wihthout Link
- 52 Scientific publishing has a language problem. Nat Hum Behav 2023; 7 (07) 1019-1020
Reference Ris Wihthout Link
- 53
Wang L,
Wan Z,
Ni C.
et al.
Applications and concerns of ChatGPT and other conversational large language models
in health care: systematic review. J Med Internet Res 2024; 26: e22769
Reference Ris Wihthout Link
- 54
Shokri R,
Stronati M,
Song C,
Shmatikov V.
Membership inference attacks against machine learning models. Published online 2017.
Accessed at: https://arxiv.org/abs/1610.05820 (preprint)
Reference Ris Wihthout Link
- 55
Fredrikson M,
Jha S,
Ristenpart T.
Model inversion attacks that exploit confidence information and basic countermeasures.
In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.
CCS '15. Association for Computing Machinery; 2015: 1322-1333
Reference Ris Wihthout Link
- 56
Top OWASP.
10 for Large Language Model Applications | OWASP Foundation. Accessed August 10, 2025
at: https://owasp.org/www-project-top-10-for-large-language-model-applications/
Reference Ris Wihthout Link
- 57
Hansson MG,
Lochmüller H,
Riess O.
et al.
The risk of re-identification versus the need to identify individuals in rare disease
research. Eur J Hum Genet 2016; 24 (11) 1553-1558
Reference Ris Wihthout Link
- 58
Sondeck LP,
Laurent M.
Practical and ready-to-use methodology to assess the re-identification risk in anonymized
datasets. Sci Rep 2025; 15 (01) 23223
Reference Ris Wihthout Link
- 59
Vasey B,
Nagendran M,
Campbell B.
et al;
DECIDE-AI expert group.
Reporting guideline for the early-stage clinical evaluation of decision support systems
driven by artificial intelligence: DECIDE-AI. Nat Med 2022; 28 (05) 924-933
Reference Ris Wihthout Link
- 60
U.S. Food and Drug Administration.
Health Canada; MHRA. Predetermined Change Control Plans for Machine Learning-Enabled
Medical Devices: Guiding Principles. U.S. Food and Drug Administration; Health Canada;
MHRA; 2023 . Accessed at: https://www.fda.gov/medical-devices/software-medical-device-samd/predetermined-change-control-plans-machine-learning-enabled-medical-devices-guiding-principles?utm_source=chatgpt.com
Reference Ris Wihthout Link
- 61
European Commission DG for R and I..
Living Guidelines on the Responsible Use of Generative AI in Research. Publications
Office of the European Union; 2024 . Accessed at: https://research-and-innovation.ec.europa.eu/document/download/2b6cf7e5-36ac-41cb-aab5-0d32050143dc_en?filename=ec_rtd_ai-guidelines.pdf
Reference Ris Wihthout Link
- 62
Yang L,
Xu S,
Sellergren A.
et al.
Advancing multimodal medical capabilities of Gemini. Published online 2024. Accessed
at: https://arxiv.org/abs/2405.03162 (preprint)
Reference Ris Wihthout Link
- 63
AlSaad R,
Abd-Alrazaq A,
Boughorbel S.
et al.
Multimodal large language models in health care: applications, challenges, and future
outlook. J Med Internet Res 2024; 26: e59505
Reference Ris Wihthout Link
- 64
Yang HY,
Hong SS,
Yoon J.
et al.
Deep learning-based surgical phase recognition in laparoscopic cholecystectomy. Ann
Hepatobiliary Pancreat Surg 2024; 28 (04) 466-473
Reference Ris Wihthout Link
- 65
Liu Y,
Boels M,
Garcia-Peraza-Herrera LC.
et al.
LoViT: Long Video Transformer for surgical phase recognition. Med Image Anal 2025;
99: 103366
Reference Ris Wihthout Link
- 66
Holderried F,
Stegemann-Philipps C,
Herrmann-Werner A.
et al.
A language model-powered simulated patient with automated feedback for history taking:
prospective study. JMIR Med Educ 2024; 10: e59213
Reference Ris Wihthout Link
- 67
Hicke Y,
Geathers J,
Rajashekar N.
et al.
MedSimAI: simulation and formative feedback generation to enhance deliberate practice
in medical education. Published online 2025. Accessed at: https://arxiv.org/abs/2503.05793
Reference Ris Wihthout Link
- 68
Plaat A,
van Duijn M,
van Stein N,
Preuss M,
van der Putten P,
Batenburg KJ.
Agentic large language models, a survey. Published online 2025. Accessed at: https://arxiv.org/abs/2503.23037 (preprint)
Reference Ris Wihthout Link
- 69
Hou X,
Zhao Y,
Wang S,
Wang H.
Model Context Protocol (MCP): landscape, security threats, and future research directions.
Published online 2025. Accessed at: https://arxiv.org/abs/2503.23278 (preprint)
Reference Ris Wihthout Link
- 70
GPT-5 System Card.
August 7, 2025 . Accessed August 7, 2025 at: https://openai.com/index/gpt-5-system-card/
Reference Ris Wihthout Link
- 71
World Health Organization.
Ethics & Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal
Models. World Health Organization; 2024 https://www.who.int/publications/i/item/9789240084759
Reference Ris Wihthout Link
- 72
Rivera SC,
Liu X,
Chan AW,
Denniston AK,
Calvert MJ.
SPIRIT-AI and CONSORT-AI Working Group; SPIRIT-AI and CONSORT-AI Steering Group; SPIRIT-AI
and CONSORT-AI Consensus Group.
Guidelines for clinical trial protocols for interventions involving arti!cial intelligence:
the SPIRIT-AI extension. Nat Med 2020; 26 (09) 1351-1363
Reference Ris Wihthout Link
- 73
Liu X,
Rivera SC,
Moher D,
Calvert MJ,
Denniston AK.
Reporting guidelines for clinical trial reports for interventions involving artificial
intelligence: the CONSORT-AI Extension. SPIRIT-AI and, Group CAW, Ashrafian H, et
al., eds. BMJ. 2020. ;370.
Reference Ris Wihthout Link
- 74
Lepp H,
Smith DS.
“You Cannot Sound Like GPT”: Signs of language discrimination and resistance in computer
science publishing. In: Proceedings of the 2025 ACM Conference on Fairness, Accountability,
and Transparency. FAccT '25. ACM; 2025: 3162-3181
Reference Ris Wihthout Link
- 75
Guevara M,
Chen S,
Thomas S.
et al.
Large language models to identify social determinants of health in electronic health
records. NPJ Digit Med 2024; 7 (01) 6
Reference Ris Wihthout Link
