RSS-Feed abonnieren
DOI: 10.1055/s-0036-1579859
Interrater Reliability of Intermediate Knosp Grades for Pituitary Adenoma Grading Is Poor
Introduction: Knosp grading is widely used to estimate pituitary adenoma cavernous sinus involvement based on preoperative MRI scans. The grades range from 0 (no cavernous sinus involvement) to 4 (complete encasement of the cavernous carotid artery). Despite it being considered a standard scale for pituitary surgery research studies, the reliability of the Knosp grading system has not been evaluated. Understanding the reliability of this scale has implications for interpreting past and future research.
Objectives: This study has three objectives: 1) to determine the interrater reliability of the Knosp grading scale, 2) to determine if there is a difference in reliability when scored by raters of differing levels of training, and 3) to determine if improvements can be made in how the scale is reported to make the scale more reliable while retaining its clinical utility.
Methods: Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuro-radiologist) participated in the study. Each rater was given a written and verbal description of project and the original manuscript describing the Knosp grading scale for reference. Fifty unique contrasted pituitary MRI scans with biopsy-proven pituitary adenoma were scored. Spearman correlation coefficients and kappa coefficients were calculated. Reliabilities using the full scale, the intermediate grades only, extreme grades only, and a commonly employed dichotomization (grades 0, 1, 2 vs 3, 4) were determined.
Results: Overall, the interrater reliability of the Knosp scale is “strong” (0.728, 95% CI 0.564 – 0.836). However, the reliability of the intermediate grades (i.e., Knosp grades 1, 2, and 3) was “very weak” (0.176, 95% CI -0.266 – 0.556). When the scale was dichotomized into tumors unlikely to have intraoperative cavernous sinus involvement (grades 0, 1, and 2) and those likely to have cavernous sinus involvement (grades 3 and 4), the reliability was “strong” (0.600, 95% CI 0.387 – 0.752). There was no significant difference in reliability when scored by raters of differing experience (resident 0.716, 95% CI 0.547 – 0.829 vs staff 0.726, 95% CI 0.562 – 0.835).
Conclusions: While this study suggests that the Knosp grading scale has acceptable interrater reliability overall, it raises important questions about the “very weak” reliability of the scale’s intermediate grades (i.e., grades 1, 2, and 3). By dichotomizing the scale into clinically useful groups of tumors likely to have cavernous sinus involvement and those that are unlikely to have cavernous sinus involvement, we are able to address the poor reliability of the intermediate grades and isolate the most important grades for used surgical decision-making (grades 3 and 4). Furthermore, this study did not detect a difference in interrater reliability based on rater training level suggesting that both residents and fully qualified staff can reliably score cases. Authors of future pituitary surgery studies are advised to report Knosp grades as dichotomized results rather than as the full scale