Language model-based labeling of German thoracic radiology reports

Alessandro Wollek; Philip Haitzer; Thomas Sedlmeyr; Sardi Hyska; Johannes Rueckel; Bastian O. Sabel; Michael Ingrisch; Tobias Lasser

doi:10.1055/a-2287-5054

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000066.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

Rofo 2025; 197(01): 55-64
DOI: 10.1055/a-2287-5054

Chest

Language model-based labeling of German thoracic radiology reports

Sprachmodellbasiertes Labeling Deutscher Röntgenthoraxbefunde

Alessandro Wollek

¹Munich Institute of Biomedical Engineering, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

²School of Computation, Information and Technology, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

,

Philip Haitzer

¹Munich Institute of Biomedical Engineering, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

²School of Computation, Information and Technology, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

,

Thomas Sedlmeyr

¹Munich Institute of Biomedical Engineering, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

²School of Computation, Information and Technology, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

,

Sardi Hyska

³Department of Radiology, Ludwig-Maximilians-University Hospital Munich, Germany, Munich, Germany

,

Johannes Rueckel

³Department of Radiology, Ludwig-Maximilians-University Hospital Munich, Germany, Munich, Germany

⁴Institute of Neuroradiology, Ludwig-Maximilians-University Hospital Munich, Munich, Germany (Ringgold ID: RIN27192)

,

Bastian O. Sabel

³Department of Radiology, Ludwig-Maximilians-University Hospital Munich, Germany, Munich, Germany

,

Michael Ingrisch

³Department of Radiology, Ludwig-Maximilians-University Hospital Munich, Germany, Munich, Germany

,

Tobias Lasser

¹Munich Institute of Biomedical Engineering, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

²School of Computation, Information and Technology, Technical University of Munich, Garching near Munich, Germany (Ringgold ID: RIN9184)

› Institutsangaben
Gefördert durch: Bundesministerium für Gesundheit 2520DAT920

› Weitere Informationen

Auch verfügbar auf

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Abstract

Purpose

The aim of this study was to explore the potential of weak supervision in a deep learning-based label prediction model. The goal was to use this model to extract labels from German free-text thoracic radiology reports on chest X-ray images and for training chest X-ray classification models.

Materials and Methods

The proposed label extraction model for German thoracic radiology reports uses a German BERT encoder as a backbone and classifies a report based on the CheXpert labels. For investigating the efficient use of manually annotated data, the model was trained using manual annotations, weak rule-based labels, and both. Rule-based labels were extracted from 66071 retrospectively collected radiology reports from 2017–2021 (DS 0), and 1091 reports from 2020–2021 (DS 1) were manually labeled according to the CheXpert classes. Label extraction performance was evaluated with respect to mention extraction, negation detection, and uncertainty detection by measuring F1 scores. The influence of the label extraction method on chest X-ray classification was evaluated on a pneumothorax data set (DS 2) containing 6434 chest radiographs with associated reports and expert diagnoses of pneumothorax. For this, DenseNet-121 models trained on manual annotations, rule-based and deep learning-based label predictions, and publicly available data were compared.

Results

The proposed deep learning-based labeler (DL) performed on average considerably stronger than the rule-based labeler (RB) for all three tasks on DS 1 with F1 scores of 0.938 vs. 0.844 for mention extraction, 0.891 vs. 0.821 for negation detection, and 0.624 vs. 0.518 for uncertainty detection. Pre-training on DS 0 and fine-tuning on DS 1 performed better than only training on either DS 0 or DS 1. Chest X-ray pneumothorax classification results (DS 2) were highest when trained with DL labels with an area under the receiver operating curve (AUC) of 0.939 compared to RB labels with an AUC of 0.858. Training with manual labels performed slightly worse than training with DL labels with an AUC of 0.934. In contrast, training with a public data set resulted in an AUC of 0.720.

Conclusion

Our results show that leveraging a rule-based report labeler for weak supervision leads to improved labeling performance. The pneumothorax classification results demonstrate that our proposed deep learning-based labeler can serve as a substitute for manual labeling requiring only 1000 manually annotated reports for training.

Key Points

The proposed deep learning-based label extraction model for German thoracic radiology reports performs better than the rule-based model.
Training with limited supervision outperformed training with a small manually labeled data set.
Using predicted labels for pneumothorax classification from chest radiographs performed equally to using manual annotations.

Citation Format

Wollek A, Haitzer P, Sedlmeyr T et al. Language modelbased labeling of German thoracic radiology reports. Fortschr Röntgenstr 2024; DOI 10.1055/a-2287-5054

Zusammenfassung

Ziel

Das Ziel dieser Studie war es, das Potenzial der schwachen Supervision in einem auf Deep Learning basierenden Modell zur Extraktion von Labels zu untersuchen. Die Motivation bestand darin, dieses Modell zu verwenden, um Labels aus deutschen Freitext-Thorax-Radiologie-Befunden zu extrahieren und damit Röntgenthorax-Klassifikationsmodelle zu trainieren.

Material und Methoden

Das vorgeschlagene Modell zur Label-Extraktion für deutsche Thorax-Radiologie-Befunde verwendet einen deutschen BERT-Encoder als Grundlage und klassifiziert einen Befund basierend auf den CheXpert-Labels. Um den effizienten Einsatz von manuell annotierten Daten zu untersuchen, wurde das Modell mit manuellen Annotationen, regelbasierten Labels und beidem trainiert. Regelbasierte Labels wurden aus 66.071 retrospektiv gesammelten Radiologie-Befunden von 2017 bis 2021 (DS 0) extrahiert, und 1091 Befunde von 2020 bis 2021 (DS 1) wurden gemäß den CheXpert-Klassen manuell annotiert. Die Leistung der Label-Extraktion wurde anhand der Erfassung von Erwähnungen, der Erkennung von Negationen und der Erkennung von Unsicherheiten anhand von F1-Scores bewertet. Der Einfluss der Label-Extraktionsmethode auf die Röntgenthorax-Klassifikation wurde anhand eines Pneumothorax-Datensatzes (DS 2) mit 6434 Thoraxaufnahmen und entsprechenden Befunden evaluiert. Hierbei wurden DenseNet-121-Modelle, die mit manuellen Annotationen, regelbasierten und durch Deep Learning-basierten Label-Vorhersagen sowie öffentlich verfügbaren Daten trainiert wurden, verglichen.

Ergebnisse

Der vorgeschlagene auf Deep Learning basierende Labeler (DL) zeigte im Durchschnitt für alle drei Aufgaben auf DS 1 eine bedeutend bessere Leistung als der regelbasierte Labeler (RB) mit F1-Scores von 0,938 gegenüber 0,844 für die Erwähnungserkennung, 0,891 gegenüber 0,821 für die Negationserkennung und 0,624 gegenüber 0,518 für die Unsicherheitserkennung. Das Vortraining auf DS 0 und das Feintuning auf DS 1 lieferte bessere Ergebnisse als nur das Training auf entweder DS 0 oder DS 1. Die Klassifikationsergebnisse für Pneumothorax auf Röntgenthoraces (DS 2) waren am besten, wenn sie mit DL-Labels trainiert wurden, mit einer Fläche unter der ROC-Kurve (AUC) von 0,939, im Vergleich zu RB-Labels mit einer AUC von 0,858. Das Training mit manuellen Labels war etwas schlechter als das Training mit DL-Labels mit einer AUC von 0,934. Das Training mit einem öffentlichen Datensatz führte zu einer AUC von 0,720.

Schlussfolgerung

Unsere Ergebnisse zeigen, dass die Nutzung eines regelbasierten Labelers für schwache Supervision zu einer verbesserten Labeling-Leistung führt. Die Klassifikationsergebnisse für Pneumothorax zeigen, dass unser vorgeschlagener auf Deep Learning basierender Labeler ein möglicher Ersatz für manuelles Labeling ist und nur 1000 manuell annotierte Befunde für das Training benötigt.

Kernaussagen

Das vorgeschlagene, Deep Learning basierende Modell zur Label-Extraktion für deutsche Thorax-Radiologie-Befunde schneidet besser ab als das regelbasierte Modell.
Das Training mit limitierter Supervision schnitt besser ab, als das Training mit einem kleinen manuell annotierten Datensatz.
Die Verwendung vorhergesagter Annotationen für die Pneumothorax-Klassifikation auf Röntgenthoraces schnitt gleich gut ab gegenüber der manuellen Annotation.

Keywords

annotation - deep learning - chest X-ray - chest radiograph - CheXpert - label extraction

Publikationsverlauf

Eingereicht: 19. Juli 2023

Angenommen nach Revision: 09. März 2024

Artikel online veröffentlicht:
25. April 2024

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Rosenkrantz AB, Hughes DR, Duszak Jr R. The US radiologist workforce: an analysis of temporal and geographic variation by using large national datasets. Radiology 2016; 279: 175-184

MissingFormLabel
Crossref PubMed Suche in Google Scholar
2 Rimmer A. Radiologist shortage leaves patient care at risk, warns royal college. BMJ: British Medical Journal (Online) 2017; 359

MissingFormLabel
Crossref PubMed Suche in Google Scholar
3 Bastawrous S, Carney B. Improving patient safety: avoiding unread imaging exams in the national VA enterprise electronic health record. Journal of digital imaging 2017; 30: 309-313

MissingFormLabel
Crossref PubMed Suche in Google Scholar
4 Rosman DA, Nshizirungu JJ, Rudakemwa E. et al. Imaging in the land of 1000 hills: Rwanda radiology country report. Journal of Global Radiology 2015; 1: 5

MissingFormLabel
Crossref PubMed Suche in Google Scholar
5 Bundesärztekammer. Zugriff am 07. Dezember 2023 unter: https://www.bundesaerztekammer.de/baek/ueber-uns/aerztestatistik/2022

MissingFormLabel
PubMed
6 Saba L, Biswas M, Kuppili V. et al. The present and future of deep learning in radiology. European Journal of Radiology 2019; 114: 14-24

MissingFormLabel
Crossref PubMed Suche in Google Scholar
7 Syed A, Zoga A. Artificial Intelligence in Radiology: Current Technology and Future Directions. Semin Musculoskelet Radiol 2018; 22: 540-545

MissingFormLabel
Thieme Connect PubMed Suche in Google Scholar
8 Dosovitskiy A, Beyer L, Kolesnikov A. et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations. 2020

MissingFormLabel
Suche in Google Scholar
9 Wollek A, Graf R, Čečatka S. et al. Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification. Radiology: Artificial Intelligence 2023; 5: e220187

MissingFormLabel
Crossref PubMed Suche in Google Scholar
10 Filice RW, Stein A, Wu CC. et al. Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset. Journal of digital imaging 2020; 33: 490-496

MissingFormLabel
Crossref PubMed Suche in Google Scholar
11 Nowak S, Biesner D, Layer YC. et al. Transformer-based structuring of free-text radiology report databases. Eur Radiol 2023;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
12 Oakden-Rayner L. Exploring large scale public medical image datasets. arXiv preprint arXiv:190712720 2019;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
13 Mikolov T, Chen K, Corrado G. et al. Efficient Estimation of Word Representations in Vector Space. arXiv:13013781 [cs] 2013;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
14 Bojanowski P, Grave E, Joulin A. et al. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 2017; 5: 135-146

MissingFormLabel
Crossref PubMed Suche in Google Scholar
15 Devlin J, Chang M-W, Lee K. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805 [cs] 2018;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
16 Radford A, Narasimhan K, Salimans T. et al. Improving language understanding by generative pre-training. OpenAI blog; 2018

MissingFormLabel
Suche in Google Scholar
17 Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. In: Advances in neural information processing systems. 2017: 5998-6008

MissingFormLabel
Suche in Google Scholar
18 Schweter S, Akbik A. Flert: Document-level features for named entity recognition. arXiv preprint arXiv:201106993 2020;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
19 Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification. arXiv preprint arXiv:180106146 2018;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
20 Smit A, Jain S, Rajpurkar P. et al. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv preprint arXiv:200409167 2020;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
21 Irvin J, Rajpurkar P, Ko M. et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 590-597

MissingFormLabel
Suche in Google Scholar
22 Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. In: International Conference on Learning Representations. 2018

MissingFormLabel
Suche in Google Scholar
23 Huang G, Liu Z, Van Der Maaten L. et al. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708

MissingFormLabel
Suche in Google Scholar
24 Wang X, Peng Y, Lu L. et al. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 3462-3471

MissingFormLabel
Suche in Google Scholar
25 Rajpurkar P, Irvin J, Zhu K. et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:171105225 2017;

MissingFormLabel
Crossref PubMed Suche in Google Scholar
26 Wollek A, Hyska S, Sedlmeyr T. et al. German CheXpert Chest X-ray Radiology Report Labeler. Fortschr Röntgenstr 2024;

MissingFormLabel
Thieme Connect PubMed Suche in Google Scholar

RSS-Feed abonnieren

Teilen / Bookmarken

Language model-based labeling of German thoracic radiology reports

Abstract

Purpose

Materials and Methods

Results

Conclusion

Key Points

Citation Format

Zusammenfassung

Ziel

Material und Methoden

Ergebnisse

Schlussfolgerung

Kernaussagen

Keywords

Publikationsverlauf

References