Subscribe to RSS
DOI: 10.1055/s-0042-1750641
Performance Comparison of a Deep Learning Algorithm and Human Readers on Detection of Anterior Cruciate Ligament Tear
Purpose or Learning Objective: To compare performances of a deep learning algorithm for the detection of anterior cruciate ligament (ACL) lesions versus human readers with varying clinical profiles.
Methods or Background: A three-dimensional convolutional neural network model was trained using a data set of 13,864 knee magnetic resonance imaging (MRI) examinations to detect ACL lesions, where ACL tear labels were extracted from available structured reports.
An additional evaluation data set consisting of 200 knee MRI examinations was used. The gold standard was established using a consensus of three experienced musculoskeletal (MSK) radiologists. A total of six human readers (two MSK radiologists, two general radiologists, and two orthopaedic surgeons) each reviewed 100 examination (nonoverlapping and stratified across readers with identical profiles) of the evaluation data set.
The performances of human readers and the deep learning algorithm were evaluated using sensitivity and specificity. We also relied on receiver operating characteristic curve analysis for the deep learning algorithm. Statistical significance of performance comparison was assessed through chi-square testing.
Results or Findings: ACL tear prevalence was 15% and 40% on the training and evaluation data sets, respectively. On the evaluation data set, the area under the curve for the deep learning algorithm was 0.987 (95% confidence interval [CI], 0.974–0.99).
Sensitivity values for ACL tear detection for MSKs, generalists, surgeons, all human readers combined, and the deep learning algorithm, were 0.825 (95% CI, 0.736–0.902, p = 0.083 versus algorithm), 0.662 (95% CI, 0.556–0.764, p = 0.004 versus algorithm), 0.862 (95% CI, 0.785–0.934, p = 0.268 versus algorithm), 0.783 (95% CI, 0.73–0.835, p = 0.035 versus algorithm), and 0.938 (95% CI, 0.88–0.987), respectively.
Specificity values for ACL tear detection for MSKs, generalists, surgeons, all human readers combined, and the deep learning algorithm were 0.942 (95% CI, 0.898–0.982, p = 0.124 versus algorithm), 0.992 (95% CI, 0.973–1., p = 0.008 versus algorithm), 0.983 (95% CI, 0.957–1., p = 0.665 versus algorithm), 0.972 (95% CI, 0.954–0.988, p = 0.077 versus algorithm), and 0.975 (95% CI, 0.942–1.), respectively.
Conclusion: Compared with a pool of human readers with varying clinical profiles, our model showed higher sensitivity and similar specificity to detect ACL lesions.
Publication History
Article published online:
02 June 2022
© 2022. Thieme. All rights reserved.
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA