Using a Large Language Model–generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A Cross-sectional Study

John J. Hanna; Christopher S. Evans; Christopher R. Dennis; K Stuart Lee; Christoph U. Lehmann; Richard J. Medford

doi:10.1055/a-2797-4295

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med
DOI: 10.1055/a-2797-4295

Original Article

Using a Large Language Model–generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A Cross-sectional Study

Authors

John J. Hanna

¹Department of Internal Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States

²Information Services, ECU Health, Greenville, North Carolina, United States

³Clinical Informatics Center, University of Texas Southwestern, Dallas, Texas, United States
Christopher S. Evans

²Information Services, ECU Health, Greenville, North Carolina, United States

⁴Department of Emergency Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States
Christopher R. Dennis

²Information Services, ECU Health, Greenville, North Carolina, United States
K Stuart Lee

⁵ECU Health Neurosurgery and Spine, ECU Health, Greenville, North Carolina, United States
Christoph U. Lehmann

³Clinical Informatics Center, University of Texas Southwestern, Dallas, Texas, United States

⁶Department of Pediatrics, University of Texas Southwestern, Dallas, Texas, United States
Richard J. Medford

¹Department of Internal Medicine, ECU Brody School of Medicine, Greenville, North Carolina, United States

²Information Services, ECU Health, Greenville, North Carolina, United States

³Clinical Informatics Center, University of Texas Southwestern, Dallas, Texas, United States

Further Information

Permissions and Reprints

Abstract

Background

Feature extraction from free text medical reports is a frequently required clinical, operational, or research procedure. Large language models (LLMs) hold a promise for automating feature extraction, which can also enable category assignment tasks.

Objective

To compare the groundedness of extracted features by five LLMs from magnetic resonance imaging (MRI) brain scan reports using a clinician-engineered versus an LLM-generated prompt.

Methods

Five OpenAI LLMs were evaluated for their ability to extract nine binary features from synthetic MRI brain reports. Two types of prompts, a clinician-engineered and an LLM-generated, were used. Metrics including recall, precision, accuracy, and F1 score were calculated to assess model performance.

Results

For all extracted features by all studied models from both tested prompts, the overall average recall was 0.956, the average precision was 0.9347, the average accuracy was 0.982, and the average F1 score was 0.9431. Using GPT-3.5-turbo, the LLM-generated prompt had better numerical performance than the clinician-engineered prompt. For the other four GPT-4 models examined, overall recall, precision, and accuracy were higher regardless of the prompt source.

Conclusion

This study highlights the potential of LLMs to generate prompts and accurately extract features, with newer models like GPT-4 performing consistently well. The efficacy of feature extraction by LLMs depends on the engineered prompt and model used. Our experimentation demonstrates the potential of LLMs to engineer prompts and extract features from MRI brain scan reports.

Keywords

generative AI - artificial intelligence - large language models - nature language processing - prompt engineering

Declaration of GenAI Use

During the writing process of this paper, the author(s) used OpenAI's ChatGPT-4o in order to create Supplementary Appendix (available in the online version only). The author(s) reviewed and edited the text and take(s) full responsibility for the content of the paper.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Publication History

Received: 16 March 2025

Accepted: 23 January 2026

Article published online:
19 February 2026

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Casey A, Davidson E, Poon M. et al. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021; 21 (01) 179

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 2004; 11 (05) 392-402

Crossref PubMed Search in Google Scholar
Download RIS citation
3 López-Úbeda P, Díaz-Galiano MC, Martín-Noguerol T, Ureña-López A, Martín-Valdivia MT, Luna A. Natural language processing in radiology: update on methods and applications. Insights Imaging 2022; 13 (01) 132

PubMed Search in Google Scholar
Download RIS citation
4 Wiggins WF, Kitamura F, Santos I, Prevedello LM. Natural language processing of radiology text reports: interactive text classification. Radiol Artif Intell 2021; 3 (04) e210035

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. NPJ Digit Med 2024; 7 (01) 222

Crossref PubMed Search in Google Scholar
Download RIS citation
6 López-Úbeda P, Martín-Noguerol T, Díaz-Angulo C, Luna A. Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: a feasibility study. Int J Med Inform 2024; 187: 105443

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Gertz RJ, Dratsch T, Bunck AC. et al. Potential of GPT-4 for detecting errors in radiology reports: implications for reporting accuracy. Radiology 2024; 311 (01) e232714

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Doshi R, Amin KS, Khosla P, Bajaj SS, Chheang S, Forman HP. Quantitative evaluation of large language models to streamline radiology report impressions: a multimodal retrospective analysis. Radiology 2024; 310 (03) e231593

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Schmidt RA, Seah JCY, Cao K, Lim L, Lim W, Yeung J. Generative large language models for detection of speech recognition errors in radiology reports. Radiol Artif Intell 2024; 6 (02) e230205

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Kathait AS, Garza-Frias E, Sikka T. et al. Assessing laterality errors in radiology: comparing generative AI and natural language processing. J Am Coll Radiol 2024; 21 (10) 1575-1582

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Kanzawa J, Yasaka K, Fujita N, Fujiwara S, Abe O. Automated classification of brain MRI reports using fine-tuned large language models. Neuroradiology 2024; 66 (12) 2177-2183

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Le Guellec B, Lefèvre A, Geay C. et al. Performance of an open-source large language model in extracting information from free-text radiology reports. Radiol Artif Intell 2024; 6 (04) e230364

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Cozzi A, Pinker K, Hidber A. et al. BI-RADS category assignments by GPT-3.5, GPT-4, and Google Bard: a multilanguage study. Radiology 2024; 311 (01) e232133

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Fervers P, Hahnfeldt R, Kottlors J. et al. ChatGPT yields low accuracy in determining LI-RADS scores based on free-text and structured radiology reports in German language. Front Radiol 2024; 4: 1390774

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Gu K, Lee JH, Shin J. et al. Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports. Liver Int 2024; 44 (07) 1578-1587

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Bhayana R, Nanda B, Dehkharghanian T. et al. Large language models for automated synoptic reports and resectability categorization in pancreatic cancer. Radiology 2024; 311 (03) e233117

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Johnson AEW, Pollard TJ, Berkowitz SJ. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 2019; 6 (01) 317

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Johnson AEW, Bulgarelli L, Shen L. et al. Author correction: MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 2023; 10 (01) 219

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Abdullah A, Kim ST. Automated radiology report labeling in chest X-ray pathologies: development and evaluation of a large language model framework. JMIR Med Inform 2025; 13: e68618

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Liu K, Ma Z, Kang X. et al. Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2025 . p. 10348-10359

Search in Google Scholar
Download RIS citation
21 Chaves JMZ, Huang S-C, Xu Y. et al. Toward a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation. arXiv. 2025 . arXiv:2403.08002

Search in Google Scholar
Download RIS citation
22 Jiménez-Sánchez A, Avlona N-R, de Boer S. et al. In the picture: medical imaging datasets, artifacts, and their living review. In: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25). New York (NY): Association for Computing Machinery; 2025: 511-531

Crossref Search in Google Scholar
Download RIS citation
23 Wu Q, Wu Q, Li H. et al. Evaluating large language models for automated reporting and data systems categorization: cross-sectional study. JMIR Med Inform 2024; 12: e55799

Crossref PubMed Search in Google Scholar
Download RIS citation

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Using a Large Language Model–generated Prompt to Extract Features from Synthetic MRI Brain Scan Reports: A Cross-sectional Study

Authors

Abstract

Background

Objective

Methods

Results

Conclusion

Keywords

Declaration of GenAI Use

Supplementary Material

Publication History

References