Semin Musculoskelet Radiol 2019; 23(S 02): S1-S18
DOI: 10.1055/s-0039-1692577
Abstracts
Georg Thieme Verlag KG Stuttgart · New York

Artificial Intelligence to Grade Hip Osteoarthritis Features on Radiographs

C. E. von Schacky
1   San Francisco, California, USA
,
F. Liu
1   San Francisco, California, USA
,
E. Ozhinsky
1   San Francisco, California, USA
,
P. M. Jungmann
2   Zurich, Switzerland
,
L. Nardo
3   Sacramento, California, USA
,
S. C. Foreman
1   San Francisco, California, USA
,
M. Nevitt
1   San Francisco, California, USA
,
V. Pedoia
1   San Francisco, California, USA
,
J. H. Sohn
1   San Francisco, California, USA
,
T. M. Link
1   San Francisco, California, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
04 June 2019 (online)

 

Purpose: The aim of this study was to implement a deep neural network in a multitask approach for grading of the presence and severity of radiographic hip osteoarthritis (OA) features.

Methods and Materials: We used 4,385 bilateral anteroposterior pelvic radiographs (2,468 participants, 8,770 hip joints) from the Osteoarthritis Initiative at baseline (n = 2,435) and 4 years (n = 1,950) (age: 61.4 ± 9.4 years; sex: 44.0% males, 56.0% females; body mass index: 28.2 ± 4.5). Femoral osteophytes (FOS), acetabular osteophytes (AOS), joint space narrowing (JSN), subchondral sclerosis (SUBSCL), and subchondral cysts (SUBCYST) were assessed. SUBSCL and SUBCYST were graded for either presence or absence; FOS, AOS, and JSN were graded using Osteoarthritis Research Society International atlas criteria on a scale from 0 to 3 for no (0), mild (1), moderate (2), or severe (3) presence of each feature, respectively. The data were split 80%, 10%, and 10% for training, validation, and testing, respectively. After hip joint detection through a RetinaNet, the images were cropped with bounding boxes around the femoral head. The multitask neural network was based on an ImageNet pretrained DenseNet-161 that served as a shared convolutional features extractor. The last fully connected layers were separately trained for each radiographic feature. The model performance was evaluated on the test set with class prediction accuracy and area under the curve (AUC) based on receiver operating characteristics (ROC).

Results: Using the deep neural network, the accuracy and AUC achieved for assessing presence or absence of FOS was 89% and 0.94, 80% and AUC 0.86 for AOS, 86% and 0.91 for JSN, 96% and AUC 0.93 for SUBCYST, and 93% and AUC 0.90 for SUBSCL. Fig. 1a shows the ROC curves for all features. The overall accuracy achieved for grading FOS, AOS, and JSN using a scale with four severity grades was 86%, 74%, and 81%, respectively. As demonstrated in Fig. 1b, clear grading errors (misclassifications between non-neighboring grades) were rare (0.3% for FOS, 2% for AOS, and 3% for JSN). A review of these cases by a board-certified musculoskeletal radiologist showed that most of these either had doubtful gradings or reduced image quality as demonstrated in the examples in Fig. 2a and 2b. Gradient-weighted class activation maps as shown in Fig. 2c demonstrate that the model focused on the region of the OA abnormality for its assessment.

Conclusion: A multitask approach based on deep learning allowed for automated severity grading of radiographic features of hip OA with high diagnostic accuracy. Clear grading errors were rare ranging from 0.3% to 3%. The performance of the model detecting and grading those features was comparable with previously reported radiologist grading reliabilities. This model might aid radiologists in clinical practice in reading hip radiographs and improve grading reliability.