Rofo 2024; 196(S 01): S29-S30
DOI: 10.1055/s-0044-1781553
Vortrag (Wissenschaft)

The Future is Collaborative: A Systematic Analysis of Federated Learning and Framework Parameters in the AI-Based Interpretation of Chest Radiographs

S Tayebi Arasteh
1   Uniklinik RWTH Aachen, Diagnostische und Interventionelle Radio, Aachen
C Kuhl
2   Uniklinik RWTH Aachen, Diagnostische und Interventionelle Radiologie, Aachen
D Truhn
2   Uniklinik RWTH Aachen, Diagnostische und Interventionelle Radiologie, Aachen
S Nebelung
2   Uniklinik RWTH Aachen, Diagnostische und Interventionelle Radiologie, Aachen
› Author Affiliations

    Zielsetzung Artificial intelligence (AI) models face challenges in generalizing across diverse datasets. Federated learning (FL) enables multi-site collaborative training without data sharing, presenting an alternative to local training. We aimed to study the factors affecting diagnostic performance of collaborative training (i.e., FL) versus local training in the AI-based interpretation of chest radiographs.

    Material und Methoden Including more than 610,000 chest radiographs from five global open-access datasets (VinDr-CXR [n=18,000]; ChestX-ray14 [n=112,120], CheXpert [n=157,878], MIMIC-CXR [n=213,921]; PadChest [n=110,525]), diagnostic performance was assessed using the AUC of held-out test sets (n=3,000, n=25,596, n=39,824, n=43,768, and n=22,045). Statistical analysis was performed using bootstrapping. Various parameters were considered: training strategy (local vs. collaborative), network type (convolutional vs. transformer-based), generalization (internal vs. external), imaging findings (such as cardiomegaly), and dataset size.

    Ergebnisse Independent of the AI network selected, larger datasets displayed no or even detrimental performance changes regarding AUC values when trained collaboratively vs. locally: CheXpert: Δ=0.000 [ns], MIMIC-CXR: Δ=-0.002 [ns] p=0.088). Smaller datasets showed significant corresponding AUC value gains: VinDr-CXR: Δ=0.048, ChestX-ray14: Δ=0.020, PadChest: Δ=0.014 (each p<0.001), indicating internal performance correlates with dataset size. In external domains, all datasets displayed significant performance gains when trained collaboratively, regardless of the network type (p<0.004). These observations were made for all imaging findings.

    Schlussfolgerungen FL holds the potential to advance privacy-preserving collaborations, harness the utilization of public datasets, and enhance domain generalization. Wider adoption of collaborative training strategies can stabilize the clinical performance of diagnostic AI models and improve patient outcomes.


    Publication History

    Article published online:
    12 April 2024

    © 2024. Thieme. All rights reserved.

    Georg Thieme Verlag
    Rüdigerstraße 14, 70469 Stuttgart, Germany