Semin Musculoskelet Radiol 2022; 26(03): 361-384
DOI: 10.1055/s-0042-1750662
Oral Presentation

Automated Detection of Vertebral Fractures in Routine CT of the Chest and Abdomen: External Validation of a Deep Learning Algorithm

J. Nicolaes
1   Brussels, Belgium
,
M. Skjødt
2   Holbæk, Denmark
,
C. Libanati
1   Brussels, Belgium
,
C. Smith
3   Odense, Denmark
,
K. Olsen
3   Odense, Denmark
,
C. Cooper
4   Southampton, United Kingdom
,
B. Abrahamsen
2   Holbæk, Denmark
› Author Affiliations
 

Purpose or Learning Objective: Identification and reporting of vertebral fractures (VFs) on routine computed tomography (CT) remains an area in need of improvement. We aimed to evaluate the performance of a deep learning algorithm for detecting VFs on CT.

Methods or Background: We conducted an external validation study of a three-dimensional convolutional neural network (CNN) model to detect VF automatically. We used 2,000 routine chest/abdomen CTs in men and women 50 years of age in an observational cohort study.

CTs were reevaluated to identify prevalent VF in a two-step process blinded to clinical information. First, the scans were triaged (CL) at the subject level as having certain VF, potential VF, or no VF. Second, an external imaging vendor (BioClinica) evaluated scans with certain or potential VF, together with a 5% subset of those with no VF, to derive reference standard readings for the individual vertebrae using the semiquantitative Genant classification.

The performance of the CNN versus human expert readings in identifying moderate or severe VF (grades 2–3) was evaluated by percentage agreement, κ, sensitivity, specificity, positive and negative predictive value (PPV and NPV), and area under the curve (AUC). Bootstrapping with 1,000 repetitions was used to construct the 95% confidence intervals (CIs).

Results or Findings: A few scans were not available for the evaluation and excluded (n = 57). Of the remaining 1,943 scans, 15.3% had one VF (grade 2–3), and 663 of 25,102 vertebrae (2.6%) were fractured (grades 2–3). The subject-level CNN performance showed a 89% agreement (95% CI, 88–90%), 91% sensitivity (95% CI, 87–94%), 89% specificity (95% CI, 87–90%), 59% PPV (95% CI, 55–64%), 98% NPV (95% CI, 97–99%), an AUC of 0.90 (95% CI, 0.88–0.91), and a κ of 0.66 (95% CI, 0.62–0.70). The vertebra-level performance showed a 98% agreement (95% CI, 97–98%), 72% sensitivity (95% CI, 68–75%), 98% specificity (95% CI, 98–99%), 54% PPV (95% CI, 51–57%), 99% NPV (95% CI, 99–99%), an AUC of 0.85 (95% CI, 0.83–0.87), and a κ of 0.61 (95% CI, 0.57–0.63).

Conclusion: The deep learning algorithm demonstrated excellent performance in the identification of VFs in chest/abdomen CTs in patients 50 years of age. Application of such an algorithm may help bridge the known reporting gap of VF on CT.



Publication History

Article published online:
02 June 2022

© 2022. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA