J Neurol Surg B Skull Base 2017; 78(S 01): S1-S156
DOI: 10.1055/s-0037-1600562
Oral Presentations
Georg Thieme Verlag KG Stuttgart · New York

Rater Reliability of the Hardy Classification for Pituitary Adenomas in the MRI Era

Michael A. Mooney
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
Douglas A. Hardesty
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
John P. Sheehy
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
Roger Bird
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
Kristina Chapple
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
William L. White
1   Barrow Neurological Institute, Phoenix, Arizona, United States
,
Andrew S. Little
1   Barrow Neurological Institute, Phoenix, Arizona, United States
› Author Affiliations
Further Information

Publication History

Publication Date:
02 March 2017 (online)

 

Background: The Hardy classification is used to classify the morphology of pituitary tumors for clinical and research purposes. Despite being considered a current standard, the scale was developed using lateral skull radiographs and encephalograms, and the rater reliability of Hardy classification for pituitary adenomas has not been evaluated in the magnetic resonance imaging (MRI) era. The authors assessed the interrater and intra-rater reliability of the scale using preoperative MRI scans and also evaluated a dichotomized version of the grading system.

Methods: Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuroradiologist) participated in the study. Each rater was given a written and verbal description of the project and a description of the original Hardy classification scale for reference. Fifty preoperative, gadolinium-enhanced, pituitary MRI scans of biopsy-proven pituitary adenomas were scored using the sellar invasion and suprasellar extension components of the Hardy scale. Images without suprasellar extension were classified as “Type 0” for this study. Spearman correlation coefficients, phi coefficients, and percent agreement were calculated. Reliabilities using the full scale, the intermediate grades only, and extreme grades only were determined.

Results: Overall, the interrater reliability of the sellar invasion subscale was strong (0.69; 95% CI, 0.51–0.81). However, the reliability of the intermediate scores was very weak (0.15; 95% CI, -0.28–0.53). All 6 raters agreed in only 8 of 50 cases (16%) using the full scale. Dichotomizing the scale into clinically useful groups (Grades 0-III versus Grade IV) maintained strong interrater reliability (0.62; 95% CI, 0.41–0.76) and increased the percent agreement to 64% (32/50 cases).

Overall, the reliability of the suprasellar extension subscale was strong (0.78; 95% CI, 0.65–0.87), but the reliability of the intermediate scores was weak (0.35; 95% CI, -0.06–0.66). In 6 of 50 cases (12%), raters agreed using the full scale. When the scale was dichotomized into groups useful for preoperative planning (Types 0-C versus Type D), the percent agreement among raters increased to 42/50 (84%). Interrater reliabilities were not significantly affected by training level and intra-rater reliability was strong when the full scales were tested.

Conclusion: This study raises important questions about the reliability and reproducibility of the Hardy classification of pituitary adenomas. Editing the measure from a 5-point scale to a dichotomous scale simplifies the rating process and, in this study, led to improved scale performance. For the purposes of future research studies, authors are encouraged to report results as the dichotomized Hardy scale (sellar invasion grades 0-III versus IV, suprasellar extension Types 0-C versus D).