Summary
Background: In medical imaging used for planning of radiation therapy, observers delineate contours
of a treatment volume in a series of images of uniform slice thickness.
Objective: To summarize agreement in contouring between an arbitrary number of observers by
a single number, we generalized the kappa index proposed by Zijdenbos et al. (1994).
Methods: Observers characterized voxels by allocating them to one of two categories, inside
or outside the contoured region. Fleiss’ kappa was used to measure association between
n indistinguishable observers. Given the number Vi
of voxels contoured by exactly i observers (i = 1, …, n), the resulting overall kappa is representable as a ratio of weighted sums of the
Vi
.
Results: Overall kappa was applied to analyze inter-center variations in a multicenter trial
on radiotherapy planning in patients with locally advanced lung cancer. A contouring
dummy run was performed within the quality assurance program. Contouring was done
twice, once before and once after a training program. Observer agreement was enhanced
from 0.59 (with a 95% confidence interval (CI) of 0.51 – 0.67) to 0.69 (95% CI 0.59
– 0.78)
Conclusion: By contrast to average pairwise indices, overall kappa measures observer agreement
for more than two observers using the full information about overlapping volumes,
while not distinguishing between observers. It is particularly adequate for measuring
observer agreement when identification of observers is not possible or desirable and
when there is no gold standard.
Keywords
Contour delineation - Fleiss’ kappa - medical imaging - observer agreement - PET-CT