Subscribe to RSS
DOI: 10.1055/a-2707-4584
Performance of a large language model in lymphoma stage assignment based on written PET/CT reports
Ergebnisqualität eines generativen Sprachmodells bei der Stadieneinteilung von Lymphomen anhand schriftlicher PET/CT-BefundeAuthors

Introduction
Large language models (LLMs) have emerged as powerful tools for addressing a wide range of tasks across diverse domains. Growing evidence suggests that they can play a significant role in patient self-education and the choice of diagnostic work-up [1] [2]. Moreover, artificial intelligence (AI) might be helpful for the classification of abnormal clinical findings and imaging patterns observed [3] [4].
Accurate stage definition is crucial in lymphoma, as therapy regimens are chosen according to disease extent and risk factors. Over the past decades, positron emission tomography (PET) combined with computed tomography (CT) has become an essential part of pre-treatment assessment [5]. However, medical reports often contain complex descriptions of tumor characteristics such as size and location. Text-processing AI tools have the potential to assist with the categorization of these findings. We therefore investigated the performance of an advanced LLM in Ann Arbor stage assignment based on imaging documentation.
Publication History
Received: 27 May 2025
Accepted after revision: 22 September 2025
Article published online:
14 October 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Rogasch JMM, Metzger G, Preisler M. et al. ChatGPT: Can You Prepare My Patients for [18F]FDG PET/CT and Explain My Reports?. J Nucl Med 2023; 64: 1876-1879
- 2 Gertz RJ, Bunck AC, Lennartz S. et al. GPT-4 for Automated Determination of Radiologic Study and Protocol Based on Radiology Request Forms: A Feasibility Study. Radiology 2023; 307: e230877
- 3 Esteva A, Kuprel B, Novoa RA. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542: 115-118
- 4 Spielvogel CP, Haberl D, Mascherbauer K. et al. Diagnosis and prognosis of abnormal cardiac scintigraphy uptake suggestive of cardiac amyloidosis using artificial intelligence: a retrospective, international, multicentre, cross-tracer development and validation study. Lancet Digit Health 2024; 6: e251-e260
- 5 Cheson BD, Fisher RI, Barrington SF. et al. Recommendations for Initial Evaluation, Staging, and Response Assessment of Hodgkin and Non-Hodgkin Lymphoma: The Lugano Classification. J Clin Oncol 2014; 32: 3059-3068
- 6 Schwartz LH, Panicek DM, Berk AR. et al. Improving Communication of Diagnostic Radiology Findings through Structured Reporting. Radiology 2011; 260: 174-181
- 7 Sibille L, Seifert R, Avramovic N. et al. 18F-FDG PET/CT Uptake Classification in Lymphoma and Lung Cancer by Using Deep Convolutional Neural Networks. Radiology 2020; 294: 445-452
- 8 Adams LC, Truhn D, Busch F. et al. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology 2023; 307: e230725
- 9 Ando K, Sato M, Wakatsuki S. et al. A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions. BJA Open 2024; 10: 100296
- 10 Lehnen NC, Dorn F, Wiest IC. et al. Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis. Radiology 2024; 311: e232741
