From Referral to Reporting: The Potential of Large Language Models in the Radiological Workflow

Anna Fink; Stephan Rau; Kai Kästingschäfer; Jakob Weiß; Fabian Bamberg; Maximilian Frederik Russe

doi:10.1055/a-2641-3059

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00000066.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Rofo
DOI: 10.1055/a-2641-3059

Review

From Referral to Reporting: The Potential of Large Language Models in the Radiological Workflow

Article in several languages: English | deutsch

Anna Fink

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

,

Stephan Rau

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

,

Kai Kästingschäfer

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

,

Jakob Weiß

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

,

Fabian Bamberg

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

,

Maximilian Frederik Russe

¹Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine, Freiburg, Germany (Ringgold ID: RIN88751)

› Author Affiliations

Supported by: Berta-Ottenstein-Programme for Clinician Scientists, Faculty of Medicine, University of Freiburg

› Further Information

Also available at

Abstract
Full Text
References
Figures
Supplementary Material

PDF Download Permissions and Reprints

Abstract
Abbreviations
Introduction
Main section

1. Principles of interaction

2. Scope of applications in clinical practice

2.1. Determining indications and defining protocol

2.2. Scheduling appointments and preparing patients

2.3. Reporting

3. Challenges and implications

4. Conclusions for practice

References

Abstract

Background

Large language models (LLMs) hold great promise for optimizing and supporting radiology workflows amidst rising workloads. This review examines potential applications in daily radiology practice, as well as remaining challenges and potential solutions.

Method

Presentation of potential applications and challenges, illustrated with practical examples and concrete optimization suggestions.

Results

LLM-based assistance systems have potential applications in almost all language-based process steps of the radiological workflow. Significant progress has been made in areas such as report generation, particularly with retrieval-augmented generation (RAG) and multi-step reasoning approaches. However, challenges related to hallucinations, reproducibility, and data protection, as well as ethical concerns, need to be addressed before widespread implementation.

Conclusion

LLMs have immense potential in radiology, particularly for supporting language-based process steps, with technological advances such as RAG and cloud-based approaches potentially accelerating clinical implementation.

Key Points

LLMs can optimize reporting and other language-based processes in radiology with technologies such as RAG and multi-step reasoning approaches.
Challenges such as hallucinations, reproducibility, privacy, and ethical concerns must be addressed before widespread adoption.
RAG and cloud-based approaches could help overcome these challenges and advance the clinical implementation of LLMs.

Citation Format

Fink A, Rau S, Kästingschäfer K et al. From Referral to Reporting: The Potential of Large Language Models in the Radiological Workflow. Rofo 2025; DOI 10.1055/a-2641-3059

Keywords

Artificial intelligence - Natural Language Processing - Machine Learning - Deep Learning - Radiology

Abbreviations

EHDS : European Health Data Space

GPT: Generative pre-trained transformer

AI: Artificial intelligence

LLM: Large language model

NLP: Natural language processing

RAG: Retrieval-augmented generation

SOP: Standard operating procedure

Introduction

Advances in technology have traditionally shaped the field of radiology. For example, the transition from film-based X-ray archiving to digital archiving or the development of modern cross-sectional imaging techniques, such as CT and MRI, represent major transformations. Today, radiology is facing a new period of transition brought on once again by technological innovation as artificial intelligence (AI) becomes part of the clinical routine.

In view of increasing workloads [1] and the related risk of errors [2], there is a growing need for tools to improve diagnostic efficiency. In recent years, the development of large language models (LLMs) such as GPT-4 [3], Claude [4], and Gemini Pro [5] has garnered considerable attention, as their potential for optimization in everyday radiology practice is very promising [6] [7] [8] [9] [10]. However, challenges remain, such as hallucinations, where incorrect answers are generated to span gaps in knowledge, as well as limitations when it comes to complex cognitive tasks [11] [12]. The lack of transparency is also problematic in a medical context, where precise and correct answers are critical [13] [14]. In addition, data protection and ethical issues have to be solved before AI can gain widespread use in medicine [15] [16].

The aim of this paper is to provide a comprehensive overview of the areas where LLMs could be applied in radiology, to discuss possible solutions to reduce the limitations mentioned, and to describe the outlook for future implementation.

Main section

1. Principles of interaction

The development of large language models would not have been possible without advances in natural language processing (NLP) [17], which studies the linguistic interaction between humans and computers. Initial research approaches in this area date back to the 1950s, but the real breakthrough came first with the introduction of transformer architecture [18]. This architecture forms the basis of many commercial models such as GPT-4 [3], Claude [4], and Gemini Pro [5], which are now known worldwide.

The automated generation of sequential text files is based on embeddings, i.e. the numerical representation of words and their context. Based on the parameters used to train the model, the LLM attempts to predict the most likely next word or word sequence in the sentence context and to thus generate a text (i.e. “generative AI”). LLM outputs are therefore primarily based on probabilities, which is a key aspect for understanding both the applications and the limitations of this technology.

Despite the massive hype around the development of these language models, their limitations soon became apparent. In addition to the often non-transparent tuning of model parameters by the providers, additional individual approaches to optimizing prompts have since been developed. Prompt engineering enables prompts to be adapted on a targeted basis, while the use of multi-level argumentation approaches can further improve the quality of the interaction. Techniques such as few-shot learning or zero-shot learning can optimize the response accuracy of LLMs by embedding task-specific information or examples directly in the prompt. Retrieval-augmented generation (RAG) further allows updatable, subject-specific information to be integrated automatically from external sources. This increases transparency because the sources used can be cited directly [19] [20].

Overall, these modifications have the potential to significantly expand the range of applications in the medical context. In almost every step of radiological patient care – from referral and scheduling to imaging and reporting ([Fig. 1]) – it is now possible to imagine applications using LLMs.

Fig. 1 Steps in the everyday routine care of radiology patients that could benefit from the potential of large language models.

2. Scope of applications in clinical practice

2.1. Determining indications and defining protocol

Since the best-known models are based on language processing via NLP, their greatest potential for use is optimizing language or text-based work steps. In radiology, one thinks primarily of adapting report texts. However, upstream process steps also offer opportunities for improving efficiency.

Radiological patient care begins by first determining the indication and then defining the diagnostic imaging protocol, which takes cooperation between referring physician and responsible radiologist. This step forms the basis for an accurate diagnosis, and it helps to prevent unnecessary scans and radiation exposure.

Rosen et al. and Barash et al. were able to show that the recommendations for appropriate imaging and contrast medium administration derived by LLMs from texts of requirements align largely with established guidelines such as the European Imaging Referral Guidelines [21] and the Appropriateness Criteria of the American College of Radiology [22]. However, many of these applications related to specialized areas, and in some cases problems arose when recommendations were vaguely worded [21] [22].

A promising approach is meta-learning or in-context learning, where the LLM optimizes its output to solve new tasks based on question-specific examples [23]. A further development of this retrieval-augmented generation technique enables the model to access an external database compiled specifically for the respective discipline and containing, for example, scientific articles, textbook content, or department-specific standard operating procedures (SOPs) [20]. The knowledge extracted is integrated directly in the LLM input, in order to provide more accurate and better informed answers ([Fig. 2]). Rau et al. and Rosen et al. were able to show that this approach significantly improves the accuracy of answers and achieves a level comparable to subject matter experts in fictitious case studies. Furthermore, the use of such specialized LLMs results in significant time and cost savings [7] [24].

Fig. 2 Process steps with RAG: After manual user input, the query is embedded in a high-dimensional vector space, in order to subsequently perform a similarity search in a separate vector index containing specialist literature or guidelines, for example. The context information obtained in this way is handed over to the language model together with the original prompt and used to produce an answer based on verifiable sources.

Future approaches that so far have been researched very little include the use of LLMs to help evaluate laboratory parameters, extract previous imaging findings automatically, and extract the relevant patient data from doctor's letters or consultation notes. Clinical information in imaging requests is often incomplete and can contain errors, which is problematic because the higher the quality of this information, the better the quality of findings [25]. Use of LLMs in this area deserves to be studied more closely, considering both the clinical need and the potential offered.

2.2. Scheduling appointments and preparing patients

It is not only radiologists who stand to benefit from integrating language models. Other professional groups, such as medical assistants, could also benefit from LLMs in the future. In one possible scenario, for example, LLMs could support appointment scheduling by prioritizing urgent requests automatically and highlighting the related appointments. Additionally, these tasks could be integrated potentially in AI-based, automated appointment scheduling systems [26].

There are also language-based activities in the area of patient preparation that could potentially be automated. For example, a combination of language models and digital informed consent forms could be developed so that patients could ideally fill out these forms at home prior to their examination, in order to reduce the time spent in waiting rooms. In this scenario, the language model would act as a go-between by accessing department-specific SOPs, timelines, and location descriptions, as well as answering patients' frequently asked questions. In addition, this technology could help the healthcare professional providing the informed consent consultation to save time by offering relevant information from the informed consent forms – such as pre-existing diseases of the kidney or thyroid gland, or possible contrast medium allergies – in a structured manner.

From a technical perspective, it is already possible to implement these approaches today, and local adaptations to internal hospital standards could also be achieved with the help of RAG. However, the quality and structure of inputs has a significant impact on LLM outputs [27]. Unstructured input from patients who potentially have little or no medical expertise could therefore lead to misinformation. As a result, research into applications will have to demonstrate the extent to which such systems can be implemented successfully.

2.3. Reporting

The image acquisition step is followed by another language-based area in the radiological workflow: reporting. This area has been the focus of LLM research in recent years, as it promises to directly reduce the workload for radiologists in everyday clinical practice.

One of the strengths of LLMs, in particular, is their ability to structure large amounts of text. In the early days of language models, this led to the development of an important field of research: generating structured findings from unstructured free text. LLMs can sort findings thematically, structure continuous texts, and visualize follow-up checks, e.g. of oncological diseases [28] [29]. In a blinded analysis, Bhayana et al. demonstrated that referring physicians prefer the structured findings generated by LLMs compared to the original findings, and physicians use these structured findings to make treatment decisions more rapidly [30]. In addition, LLMs can be used to correct existing report texts and thus save time in reporting [31] [32]. The first companies in the US are already offering such systems to generate evaluations of findings automatically, such as RadAI with Omni Impressions [33] or Nuance Communications with PowerScribe Smart Impression [34].

LLMs could also be used potentially in the final step of the process chain when findings are communicated to patients. Studies by Amin et al. and Meddeb et al. have shown that it is possible to translate radiological terminology into simpler concepts that are easy for patients to understand [10], as well as into foreign languages [35], in order to overcome communication barriers.

For a long time, it was thought that these applications held the greatest potential for LLMs, while limits were reached in generating new texts. Although popular large language models were able to pass multiple-choice-based knowledge tests, such as American Board of Radiology’s certifying exam, these models sometimes produced low quality results in terms of robustness and reproducibility. In addition, the models with high self-confidence produced incorrect solutions and displayed deficits, particularly when performing complex thinking tasks [11] [36]. Deficits were also found when answering questions requiring medical knowledge or generating differential diagnoses from report texts, which underscores further the importance of including expert medical knowledge in the data used to train language models [37] [38].

A major problem in this regard is that most of the powerful models come from commercial providers. So specialized medical training is unlikely to be included due to the lack of interest on the part of the providers. In addition, manual and task-specific training of the models is extremely time and data-intensive and is therefore difficult to implement.

As a result, a variety of approaches has emerged in recent years that incorporate task-specific knowledge directly in the input prompt instead of retraining the entire model [23]. Nevertheless, when integrating large amounts of data into the input prompt, one quickly encounters input restrictions (also known as token limits), in addition to the problem that relevant content is at risk of getting lost in the massive amounts of information [39]. One promising solution is RAG, in which the LLM accesses, with every prompt, an external, manually created database of specialist articles, textbooks, or SOPs. This approach has led not only to a significant performance improvement in radiological questions [40], but it has also shown potential to provide a diagnosis based on unstructured report texts. For example, concrete diagnoses could be generated in trauma imaging [8], gastrointestinal imaging [9], or fracture classification, in accordance with the guidelines of the Swiss AO Foundation [41]. [Fig. 3] and [Fig. 4] show two practical examples with the corresponding output from the field of trauma imaging. The detailed input prompt for both models is provided in the supplementary material (Suppls. 1 and 2).

Fig. 3 Comparison of a generic model (GPT-4 Turbo, incorrect answers are highlighted in red) versus an enhanced model that uses a two-stage prompt, as well as retrieval-augmented generation (RAG) (GPT-4 Turbo with RAG, correct answer), to diagnose and classify a proximal tibial fracture, Schatzker type IV. The RAG solution provided the LLM with context-specific information extracted automatically from the “RadioGraphics Top 10 Reading List Trauma Radiology” [51]. The detailed input prompt for both models is provided in the supplementary material (Suppls. 1 and 2).

Fig. 4 Comparison of a generic model (GPT-4 Turbo, incorrect answers are highlighted in red) versus an enhanced model that uses a two-stage prompt, as well as retrieval-augmented generation (RAG) (GPT-4 Turbo with RAG, correct answer), to diagnose and classify a periprosthetic femur fracture, Vancouver type AGT. The RAG solution provided the LLM with context-specific information extracted automatically from the “RadioGraphics Top 10 Reading List Trauma Radiology” [42]. The detailed input prompt for both models is provided in the supplementary material (Suppls. 1 and 2).

Such tools could lead to significant time savings in routine radiology and reduce the amount of time-consuming research. To further improve transparency and confidence in LLM statements, hyperlinks to the sources used, including page references for the information extracted, can be included in each answer [8].

[Fig. 5] provides an overview of the possible applications discussed.

Fig. 5 Potential applications of LLMs in the radiological process chain.

3. Challenges and implications

Despite the enormous potential of LLMs, the limitations nevertheless need to be taken into account. The most well-known challenges include hallucinations, where misinformation is generated to span gaps in knowledge, as well as problems with more complex thinking tasks that involve multiple iterative steps. LLMs are based on probabilistic predictions and do not use classical machine learning with a ground truth reference value. This leads to limitations in specialized areas, where dedicated information is underrepresented in the training dataset.

Another problem is that the knowledge is not always up-to-date, because language models only use the information provided up to the time of their training (for GPT-4 Turbo until December 2023 [43]). This is particularly problematic in rapidly developing areas such as radiology. For example, diagnostic guidelines might have been revised in the meantime, meaning the LLM would no longer have access to the latest version and its response would therefore be based on outdated information.

Since subject-specific training is not very feasible currently for the reasons already mentioned, solutions are instead focusing primarily on creating the best possible input prompts, e.g. by using multi-stage argumentation approaches, or on supplementing the input data using RAG [27]. The LLM can access either real-time web databases such as PubMed or a traditional RAG database with carefully curated, scientifically reviewed information. Agent-based approaches [44], where several RAG-augmented LLMs interact similar to an interdisciplinary team of experts and thereby produce a joint result, also offer promising future prospects. This brings up the important question of responsibility. In the future, it may be necessary for RAG databases to be created and continuously updated by professional societies, scientific journals, or within departments, taking into account local SOPs.

One further limitation is the linguistic and, in some cases, content-related variability of the outputs, so that the same instruction can lead to different results several times in a row [45]. This is usually due to a high degree of creativity in the language model, i.e. the variability when selecting the next word (“temperature setting”). In most common LLMs, this parameter can be adjusted manually to ensure greater consistency. Problems related to the lack of transparency in LLM versions can be addressed by RAG-based, verifiable references with hyperlinks in each edition.

Data protection concerns represent another obstacle to implementing LLMs in clinical routines. Every LLM request in the high-performance, commercially operated LLMs runs through third-party company servers; widescale use of LLMs combined with highly sensitive patient data would thus be a serious violation of data protection laws. One solution would be to use specially developed local models, which requires both a high level of expertise and considerable server capacity.

A resource-saving and sustainable alternative is offered by cloud-based solutions from providers such as AWS [46], Google [47], or Microsoft [48]. The data are processed in a protected cloud environment, which is subject to data security similar to an internal hospital IT system. In this context, the European Society of Radiology recently advocated for a compliant implementation of the European AI Act, including through the creation of a European Health Data Space [49]. Since the use of these systems still requires a high level of technical expertise, commercial out-of-the-box solutions or integrated platform approaches could become established in the future.

Finally, ethical concerns should not be ignored, especially regarding the inherent bias of LLMs that can arise from distorted training data. There is a risk that users will be influenced and possibly misled by the models' often convincingly presented answers. This is particularly relevant when LLMs serve as preliminary information for patients without a medical background, for example, in the context of digital education. Before a broad implementation, it is therefore essential to ensure both the diversity of the training data and a comprehensible reasoning chain of the language model in order to be able to verify the outputs.

As the use of large language models becomes more widespread, it is becoming increasingly common for patients to enter image data or report texts in LLMs, in order to translate this information into language understandable by a layperson or to get a supposed second opinion. Since LLMs respond with a high degree of linguistic confidence even in complex situations, this can lead to uncertainty and questions from patients. Radiological specialists should therefore be trained specifically in dealing with such situations and in classifying LLM-generated statements.

In addition to the individual responsibility of physicians, medical associations, in particular, can play a key role in integrating LLM applications in the healthcare system. In its current statement, the German Medical Association calls explicitly for professional organizations to support the use of AI in clinical practice through clear, evidence-based recommendations for action [50]. Additionally, considering the speed of innovation in the field, these new systems will have to be evaluated continually to ensure their safety, effectiveness, and quality in everyday clinical practice for the long term.

4. Conclusions for practice

In summary, LLMs offer tremendous potential, which is already being discussed widely in the medical community [12] [42]. In radiology, LLMs can support language-based process steps, in particular. The ongoing technological advances by these models, as well as perspectives such as RAG, agent-based models, or cloud-based approaches, could enable clinical implementation. In this context, it is critical to define solid rules regarding data security, ethical issues, and responsibilities. In addition, comprehensive training needs to be provided to radiologists and medical staff regarding the functionality, capabilities, and limitations of LLMs, in order to ensure they are used responsibly and to build the confidence in them that is essential for their successful implementation.

In view of the ever increasing number of examinations and expanding workloads, it is important to actively shape these developments, in order to ensure that LLMs are used responsibly in support of everyday radiology practice.

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Material

Supplementary Material

Supplementary Material

References
1 McDonald RJ, Schwartz KM, Eckel LJ. et al. The Effects of Changes in Utilization and Technological Advancements of Cross-Sectional Imaging on Radiologist Workload. Acad Radiol 2015; 22: 1191-8

MissingFormLabel
Crossref PubMed Search in Google Scholar
2 Kasalak Ö, Alnahwi H, Toxopeus R. et al. Work overload and diagnostic errors in radiology. Eur J Radiol 2023; 167: 111032

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 OpenAIAchiam J, Adler S. et al. GPT-4 Technical Report. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2303.08774

MissingFormLabel
PubMed Search in Google Scholar
4 Enis M, Hopkins M. From LLM to NMT: Advancing Low-Resource Machine Translation with Claude. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2404.13813

MissingFormLabel
PubMed Search in Google Scholar
5 Team G, Anil R, Borgeaud S. et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2312.11805

MissingFormLabel
PubMed Search in Google Scholar
6 Gertz RJ, Bunck AC, Lennartz S. et al. GPT-4 for Automated Determination of Radiologic Study and Protocol Based on Radiology Request Forms: A Feasibility Study. Radiology 2023; 307: e230877

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Rau A, Rau S, Zöller D. et al. A Context-based Chatbot Surpasses Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines. Radiology 2023; 308: e230970

MissingFormLabel
Crossref PubMed Search in Google Scholar
8 Fink A, Nattenmüller J, Rau S. et al. Retrieval-augmented generation improves precision and trust of a GPT-4 model for emergency radiology diagnosis and classification: A proof-of-concept study. Eur Radiol 2025;

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Rau S, Rau A, Nattenmüller J. et al. A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: A proof of concept study. Eur Radiol Exp 2024; 8: 1-8

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Amin KS, Davis MA, Doshi R. et al. Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology 2023; 309: e232561

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology 2023; 307: e230582

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024; 310: e232756

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023; 11: 887

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 Srivastav S, Chandrakar R, Gupta S. et al. ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis. Cureus 2023; 15: e41435

MissingFormLabel
Crossref PubMed Search in Google Scholar
15 European Society of Radiology (ESR). What the radiologist should know about artificial intelligence – an ESR white paper. Insights Imaging 2019; 10: 44

MissingFormLabel
Crossref PubMed Search in Google Scholar
16 Strohm L, Hehakaya C, Ranschaert ER. et al. Implementation of artificial intelligence (AI) applications in radiology: Hindering and facilitating factors. Eur Radiol 2020; 30: 5525-5532

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2011; 18: 544-51

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Vaswani A, Shazeer N, Parmar N. et al. Attention Is All You Need. arXiv;. 2023 Accessed March 12, 2025 at: http://arxiv.org/abs/1706.03762

MissingFormLabel
PubMed Search in Google Scholar
19 Fink A, Rau A, Kotter E. et al. Optimierte Interaktion mit Large Language Models. Radiologie 2025;

MissingFormLabel
Crossref PubMed Search in Google Scholar
20 Lewis P, Perez E, Piktus A. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv;. 2021 Accessed November 21, 2024 at: http://arxiv.org/abs/2005.11401

MissingFormLabel
PubMed Search in Google Scholar
21 Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34: 2826-37

MissingFormLabel
Crossref PubMed Search in Google Scholar
22 Barash Y, Klang E, Konen E. et al. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J Am Coll Radiol 2023; 20: 998-1003

MissingFormLabel
Crossref PubMed Search in Google Scholar
23 Brown TB, Mann B, Ryder N. et al. Language Models are Few-Shot Learners. arXiv;. 2020 Accessed November 21, 2024 at: http://arxiv.org/abs/2005.14165

MissingFormLabel
PubMed Search in Google Scholar
24 Russe MF, Rau A, Ermer MA. et al. A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging. Dentomaxillofac Radiol 2024; 53: 109-14

MissingFormLabel
Crossref PubMed Search in Google Scholar
25 Castillo C, Steffens T, Sim L. et al. The effect of clinical information on radiology reporting: A systematic review. J Med Radiat Sci 2021; 68: 60-74

MissingFormLabel
Crossref PubMed Search in Google Scholar
26 Chen PS, Lai CH, Chen YT. et al. Developing a prototype system of computer-aided appointment scheduling: A radiology department case study. Technol Health Care 2024; 32: 997-1013

MissingFormLabel
Crossref PubMed Search in Google Scholar
27 Russe MF, Reisert M, Bamberg F. et al. Improving the use of LLMs in radiology through prompt engineering: from precision prompts to zero-shot learning. Rofo 2024; 196: 1166-70

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
28 Laukamp KR, Terzis RA, Werner JM. et al. Monitoring Patients with Glioblastoma by Using a Large Language Model: Accurate Summarization of Radiology Reports with GPT-4. Radiology 2024; 312: e232640

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Fink MA, Bischoff A, Fink CA. et al. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology 2023; 308: e231362

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Bhayana R, Nanda B, Dehkharghanian T. et al. Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer. Radiology 2024; 311: e233117

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Kim S, Kim D, Shin HJ. et al. Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports. Radiology 2025; 314: e240701

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 Gertz RJ, Dratsch T, Bunck AC. et al. Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy. Radiology 2024; 311: e232714

MissingFormLabel
Crossref PubMed Search in Google Scholar
33 Rad AI to Unveil Next-Generation Intelligent Radiology Reporting Solution at Launch Event. Accessed March 12, 2025 at: https://www.radai.com/news/rad-ai-to-unveil-next-generation-intelligent-radiology-reporting-solution-at-launch-event

MissingFormLabel
PubMed
34 PowerScribe One: Microsoft Cloud For Healthcare. Accessed March 12, 2025 at: https://www.microsoft.com/en-us/health-solutions/radiology-workflow/powerscribe-one

MissingFormLabel
PubMed
35 Meddeb A, Lüken S, Busch F. et al. Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages. Radiology 2024; 313: e241736

MissingFormLabel
Crossref PubMed Search in Google Scholar
36 Krishna S, Bhambra N, Bleakney R. et al. Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination. Radiology 2024; 311: e232715

MissingFormLabel
Crossref PubMed Search in Google Scholar
37 Rahsepar AA, Tavakoli N, Kim GHJ. et al. How AI Responds to Common Lung Cancer Questions: ChatGPT versus Google Bard. Radiology 2023; 307: e230922

MissingFormLabel
Crossref PubMed Search in Google Scholar
38 Sun SH, Huynh K, Cortes G. et al. Testing the Ability and Limitations of ChatGPT to Generate Differential Diagnoses from Transcribed Radiologic Findings. Radiology 2024; 313: e232346

MissingFormLabel
Crossref PubMed Search in Google Scholar
39 Liu NF, Lin K, Hewitt J. et al. Lost in the Middle: How Language Models Use Long Contexts. arXiv;. 2023 Accessed November 21, 2024 at: http://arxiv.org/abs/2307.03172

MissingFormLabel
PubMed Search in Google Scholar
40 Bhayana R, Fawzy A, Deng Y. et al. Retrieval-Augmented Generation for Large Language Models in Radiology: Another Leap Forward in Board Examination Performance. Radiology 2024; 313: e241489

MissingFormLabel
Crossref PubMed Search in Google Scholar
41 Russe MF, Fink A, Ngo H. et al. Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports. Sci Rep 2023; 13: 14215

MissingFormLabel
Crossref PubMed Search in Google Scholar
42 RG TEAM Top 10 Reading List. Accessed March 12, 2025 at: https://pubs.rsna.org/page/radiographics/rgteam/top10_trauma

MissingFormLabel
PubMed
43 OpenAI Platform. Accessed March 30, 2024 at: https://platform.openai.com

MissingFormLabel
PubMed
44 Ravuru C, Sakhinana SS, Runkana V. Agentic Retrieval-Augmented Generation for Time Series Analysis. arXiv;. 2024 Accessed November 21, 2024 at: http://arxiv.org/abs/2408.14484

MissingFormLabel
PubMed Search in Google Scholar
45 Shen Y, Heacock L, Elias J. et al. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023; 307: e230163

MissingFormLabel
Crossref PubMed Search in Google Scholar
46 Amazon Web Services, Inc. Amazon Web Services AWS – Server Hosting & Cloud Services. Accessed February 01, 2025 at: https://aws.amazon.com/de/

MissingFormLabel
PubMed
47 Google Cloud. Sovereign Cloud Solutions. Accessed February 01, 2025 at: https://cloud.google.com/sovereign-cloud

MissingFormLabel
PubMed
48 Microsoft Cloud for Sovereignty. Accessed February 01, 2025 at: https://www.microsoft.com/en-us/industry/sovereignty/cloud

MissingFormLabel
PubMed
49 Kotter E, D’Antonoli TA, Cuocolo R. et al. Guiding AI in radiology: ESR’s recommendations for effective implementation of the European AI Act. Insights Imaging 2025; 16: 1-11

MissingFormLabel
Crossref PubMed Search in Google Scholar
50 Bundesärztekammer. [German Medical Association] Statement regarding “Artificial Intelligence in Medicine.”. Dtsch Arztebl International 2025; 4: 238

MissingFormLabel
Crossref PubMed Search in Google Scholar
51 Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med 2023; 388: 1233-39

MissingFormLabel
Crossref PubMed Search in Google Scholar

Correspondence

Anna Fink

Department of Diagnostic and Interventional Radiology, University of Freiburg Faculty of Medicine

Hugstetter Str. 55

79106 Freiburg

Germany

Email: anna.fink@uniklinik-freiburg.de

Publication History

Received: 25 March 2025

Accepted after revision: 16 June 2025

Article published online:
16 July 2025

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 McDonald RJ, Schwartz KM, Eckel LJ. et al. The Effects of Changes in Utilization and Technological Advancements of Cross-Sectional Imaging on Radiologist Workload. Acad Radiol 2015; 22: 1191-8

MissingFormLabel
Crossref PubMed Search in Google Scholar
2 Kasalak Ö, Alnahwi H, Toxopeus R. et al. Work overload and diagnostic errors in radiology. Eur J Radiol 2023; 167: 111032

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 OpenAIAchiam J, Adler S. et al. GPT-4 Technical Report. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2303.08774

MissingFormLabel
PubMed Search in Google Scholar
4 Enis M, Hopkins M. From LLM to NMT: Advancing Low-Resource Machine Translation with Claude. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2404.13813

MissingFormLabel
PubMed Search in Google Scholar
5 Team G, Anil R, Borgeaud S. et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv;. 2024 Accessed March 12, 2025 at: http://arxiv.org/abs/2312.11805

MissingFormLabel
PubMed Search in Google Scholar
6 Gertz RJ, Bunck AC, Lennartz S. et al. GPT-4 for Automated Determination of Radiologic Study and Protocol Based on Radiology Request Forms: A Feasibility Study. Radiology 2023; 307: e230877

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Rau A, Rau S, Zöller D. et al. A Context-based Chatbot Surpasses Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines. Radiology 2023; 308: e230970

MissingFormLabel
Crossref PubMed Search in Google Scholar
8 Fink A, Nattenmüller J, Rau S. et al. Retrieval-augmented generation improves precision and trust of a GPT-4 model for emergency radiology diagnosis and classification: A proof-of-concept study. Eur Radiol 2025;

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Rau S, Rau A, Nattenmüller J. et al. A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: A proof of concept study. Eur Radiol Exp 2024; 8: 1-8

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Amin KS, Davis MA, Doshi R. et al. Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology 2023; 309: e232561

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology 2023; 307: e230582

MissingFormLabel
Crossref PubMed Search in Google Scholar
12 Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024; 310: e232756

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023; 11: 887

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 Srivastav S, Chandrakar R, Gupta S. et al. ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis. Cureus 2023; 15: e41435

MissingFormLabel
Crossref PubMed Search in Google Scholar
15 European Society of Radiology (ESR). What the radiologist should know about artificial intelligence – an ESR white paper. Insights Imaging 2019; 10: 44

MissingFormLabel
Crossref PubMed Search in Google Scholar
16 Strohm L, Hehakaya C, Ranschaert ER. et al. Implementation of artificial intelligence (AI) applications in radiology: Hindering and facilitating factors. Eur Radiol 2020; 30: 5525-5532

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2011; 18: 544-51

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Vaswani A, Shazeer N, Parmar N. et al. Attention Is All You Need. arXiv;. 2023 Accessed March 12, 2025 at: http://arxiv.org/abs/1706.03762

MissingFormLabel
PubMed Search in Google Scholar
19 Fink A, Rau A, Kotter E. et al. Optimierte Interaktion mit Large Language Models. Radiologie 2025;

MissingFormLabel
Crossref PubMed Search in Google Scholar
20 Lewis P, Perez E, Piktus A. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv;. 2021 Accessed November 21, 2024 at: http://arxiv.org/abs/2005.11401

MissingFormLabel
PubMed Search in Google Scholar
21 Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34: 2826-37

MissingFormLabel
Crossref PubMed Search in Google Scholar
22 Barash Y, Klang E, Konen E. et al. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J Am Coll Radiol 2023; 20: 998-1003

MissingFormLabel
Crossref PubMed Search in Google Scholar
23 Brown TB, Mann B, Ryder N. et al. Language Models are Few-Shot Learners. arXiv;. 2020 Accessed November 21, 2024 at: http://arxiv.org/abs/2005.14165

MissingFormLabel
PubMed Search in Google Scholar
24 Russe MF, Rau A, Ermer MA. et al. A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging. Dentomaxillofac Radiol 2024; 53: 109-14

MissingFormLabel
Crossref PubMed Search in Google Scholar
25 Castillo C, Steffens T, Sim L. et al. The effect of clinical information on radiology reporting: A systematic review. J Med Radiat Sci 2021; 68: 60-74

MissingFormLabel
Crossref PubMed Search in Google Scholar
26 Chen PS, Lai CH, Chen YT. et al. Developing a prototype system of computer-aided appointment scheduling: A radiology department case study. Technol Health Care 2024; 32: 997-1013

MissingFormLabel
Crossref PubMed Search in Google Scholar
27 Russe MF, Reisert M, Bamberg F. et al. Improving the use of LLMs in radiology through prompt engineering: from precision prompts to zero-shot learning. Rofo 2024; 196: 1166-70

MissingFormLabel
Thieme Connect PubMed Search in Google Scholar
28 Laukamp KR, Terzis RA, Werner JM. et al. Monitoring Patients with Glioblastoma by Using a Large Language Model: Accurate Summarization of Radiology Reports with GPT-4. Radiology 2024; 312: e232640

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Fink MA, Bischoff A, Fink CA. et al. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology 2023; 308: e231362

MissingFormLabel
Crossref PubMed Search in Google Scholar
30 Bhayana R, Nanda B, Dehkharghanian T. et al. Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer. Radiology 2024; 311: e233117

MissingFormLabel
Crossref PubMed Search in Google Scholar
31 Kim S, Kim D, Shin HJ. et al. Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports. Radiology 2025; 314: e240701

MissingFormLabel
Crossref PubMed Search in Google Scholar
32 Gertz RJ, Dratsch T, Bunck AC. et al. Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy. Radiology 2024; 311: e232714

MissingFormLabel
Crossref PubMed Search in Google Scholar
33 Rad AI to Unveil Next-Generation Intelligent Radiology Reporting Solution at Launch Event. Accessed March 12, 2025 at: https://www.radai.com/news/rad-ai-to-unveil-next-generation-intelligent-radiology-reporting-solution-at-launch-event

MissingFormLabel
PubMed
34 PowerScribe One: Microsoft Cloud For Healthcare. Accessed March 12, 2025 at: https://www.microsoft.com/en-us/health-solutions/radiology-workflow/powerscribe-one

MissingFormLabel
PubMed
35 Meddeb A, Lüken S, Busch F. et al. Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages. Radiology 2024; 313: e241736

MissingFormLabel
Crossref PubMed Search in Google Scholar
36 Krishna S, Bhambra N, Bleakney R. et al. Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination. Radiology 2024; 311: e232715

MissingFormLabel
Crossref PubMed Search in Google Scholar
37 Rahsepar AA, Tavakoli N, Kim GHJ. et al. How AI Responds to Common Lung Cancer Questions: ChatGPT versus Google Bard. Radiology 2023; 307: e230922

MissingFormLabel
Crossref PubMed Search in Google Scholar
38 Sun SH, Huynh K, Cortes G. et al. Testing the Ability and Limitations of ChatGPT to Generate Differential Diagnoses from Transcribed Radiologic Findings. Radiology 2024; 313: e232346

MissingFormLabel
Crossref PubMed Search in Google Scholar
39 Liu NF, Lin K, Hewitt J. et al. Lost in the Middle: How Language Models Use Long Contexts. arXiv;. 2023 Accessed November 21, 2024 at: http://arxiv.org/abs/2307.03172

MissingFormLabel
PubMed Search in Google Scholar
40 Bhayana R, Fawzy A, Deng Y. et al. Retrieval-Augmented Generation for Large Language Models in Radiology: Another Leap Forward in Board Examination Performance. Radiology 2024; 313: e241489

MissingFormLabel
Crossref PubMed Search in Google Scholar
41 Russe MF, Fink A, Ngo H. et al. Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports. Sci Rep 2023; 13: 14215

MissingFormLabel
Crossref PubMed Search in Google Scholar
42 RG TEAM Top 10 Reading List. Accessed March 12, 2025 at: https://pubs.rsna.org/page/radiographics/rgteam/top10_trauma

MissingFormLabel
PubMed
43 OpenAI Platform. Accessed March 30, 2024 at: https://platform.openai.com

MissingFormLabel
PubMed
44 Ravuru C, Sakhinana SS, Runkana V. Agentic Retrieval-Augmented Generation for Time Series Analysis. arXiv;. 2024 Accessed November 21, 2024 at: http://arxiv.org/abs/2408.14484

MissingFormLabel
PubMed Search in Google Scholar
45 Shen Y, Heacock L, Elias J. et al. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023; 307: e230163

MissingFormLabel
Crossref PubMed Search in Google Scholar
46 Amazon Web Services, Inc. Amazon Web Services AWS – Server Hosting & Cloud Services. Accessed February 01, 2025 at: https://aws.amazon.com/de/

MissingFormLabel
PubMed
47 Google Cloud. Sovereign Cloud Solutions. Accessed February 01, 2025 at: https://cloud.google.com/sovereign-cloud

MissingFormLabel
PubMed
48 Microsoft Cloud for Sovereignty. Accessed February 01, 2025 at: https://www.microsoft.com/en-us/industry/sovereignty/cloud

MissingFormLabel
PubMed
49 Kotter E, D’Antonoli TA, Cuocolo R. et al. Guiding AI in radiology: ESR’s recommendations for effective implementation of the European AI Act. Insights Imaging 2025; 16: 1-11

MissingFormLabel
Crossref PubMed Search in Google Scholar
50 Bundesärztekammer. [German Medical Association] Statement regarding “Artificial Intelligence in Medicine.”. Dtsch Arztebl International 2025; 4: 238

MissingFormLabel
Crossref PubMed Search in Google Scholar
51 Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med 2023; 388: 1233-39

MissingFormLabel
Crossref PubMed Search in Google Scholar