CC BY-NC-ND 4.0 · Methods Inf Med 2019; 58(S 02): e27-e42
DOI: 10.1055/s-0039-1693732
Original Article
Georg Thieme Verlag KG Stuttgart · New York

Overrating Classifier Performance in ROC Analysis in the Absence of a Test Set: Evidence from Simulation and Italian CARATkids Validation

Giovanna Cilluffo§
1   Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
2   Department of Economical, Business and Statistical Science, University of Palermo, Palermo, Italy
,
Salvatore Fasola§
1   Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
2   Department of Economical, Business and Statistical Science, University of Palermo, Palermo, Italy
,
Giuliana Ferrante
3   Department of Health Promotion Sciences, Maternal and Infant Care, Internal Medicine and Medical Specialities, University of Palermo, Italy
,
Laura Montalbano
1   Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
,
Ilaria Baiardini
4   Department of Biomedical Sciences, Humanitas University, Milan, Italy
,
Luciana Indinnimeo
5   Department of Pediatrics and NPI, University of Roma Sapienza, Rome, Italy
,
Giovanni Viegi
1   Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
6   Institute of Clinical Physiology, Pulmonary Environmental Epidemiology Unit, National Research Council of Italy, Pisa, Italy
,
Joao A. Fonseca
7   Department of Immunoallergy, CUF Porto Hospital and Institute, Porto, Portugal
,
Stefania La Grutta
1   Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
› Author Affiliations
Funding None.
Further Information

Publication History

04 October 2018

21 May 2019

Publication Date:
19 November 2019 (online)

Abstract

Background The use of receiver operating characteristic curves, or “ROC analysis,” has become quite common in biomedical research to support decisions. However, sensitivity, specificity, and misclassification rates are still often estimated using the training sample, overlooking the risk of overrating the test performance.

Methods A simulation study was performed to highlight the inferential implications of splitting (or not) the dataset into training and test set. The normality assumption was made for the classifier given the disease status, and the Youden's criterion considered for the detection of the optimal cutoff. Then, an ROC analysis with sample split was applied to assess the discriminant validity of the Italian version of the Control of Allergic Rhinitis and Asthma Test (CARATkids) questionnaire for children with asthma and rhinitis, for which recent studies may have reported liberal performance estimates.

Results The simulation study showed that both single split and cross-validation (CV) provided unbiased estimators of sensitivity, specificity, and misclassification rate, therefore allowing computation of confidence intervals. For the Italian CARATkids questionnaire, the misclassification rate estimated by fivefold CV was 0.22, with 95% confidence interval 0.14 to 0.30, indicating an acceptable discriminant validity.

Conclusions Splitting into training and test set avoids overrating the test performance in ROC analysis. Validated through this method, the Italian CARATkids is valid for assessing disease control in children with asthma and rhinitis.

Note

All data and materials are available upon request.


§ These two authors contributed equally.


Supplementary Material

 
  • References

  • 1 Hajian-Tilaki K. Receiver operating characteristic (roc) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013; 4 (02) 627-635
  • 2 Fawcett T. An introduction to roc analysis. Pattern Recognit Lett 2006; 27 (08) 861-874
  • 3 Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993; 39 (04) 561-577
  • 4 Zhou X-H, McClish DK, Obuchowski NA. Statistical Methods in Diagnostic Medicine, volume 569. New York: John Wiley & Sons; 2009
  • 5 Chinellato I, Piazza M, Sandri M. , et al. Evaluation of association between exercise-induced bronchoconstriction and childhood asthma control test questionnaire scores in children. Pediatr Pulmonol 2012; 47 (03) 226-232
  • 6 Voorend-van Bergen S, Vaessen-Verberne AA, Landstra AM. , et al. Monitoring childhood asthma: web-based diaries and the asthma control test. J Allergy Clin Immunol 2014; 133 (06) 1599-605.e2
  • 7 Behan L, Dimitrov BD, Kuehni CE. , et al. PICADAR: a diagnostic predictive tool for primary ciliary dyskinesia. Eur Respir J 2016; 47 (04) 1103-1112
  • 8 Takemura M, Nishio M, Fukumitsu K. , et al. Optimal cut-off value and clinical usefulness of the Adherence Starts with Knowledge-12 in patients with asthma taking inhaled corticosteroids. J Thorac Dis 2017; 9 (08) 2350-2359
  • 9 Coffin M, Sukhatme S. Receiver operating characteristic studies and measurement errors. Biometrics 1997; 53 (03) 823-837
  • 10 Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3 (01) 32-35
  • 11 Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol 2006; 163 (07) 670-675
  • 12 Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 2005; 38 (05) 404-415
  • 13 McNeil BJ, Adelstein SJ. Determining the value of diagnostic and screening tests. J Nucl Med 1976; 17 (06) 439-448
  • 14 Fawcett T. Roc graphs: notes and practical considerations for researchers. Mach Learn 2004; 31 (01) 1-38
  • 15 Zhong M. An Analysis of Misclassification Rates for Decision Trees. Orlando: University of Central Florida; 2007
  • 16 Westerhuis JA, Hoefsloot HCJ, Smit S. , et al. Assessment of PLSDA cross validation. Metabolomics 2008; 4 (01) 81-89
  • 17 Zou KH, Warfield SK, Fielding JR. , et al. Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data. Acad Radiol 2003; 10 (12) 1359-1368
  • 18 Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998; 17 (09) 1033-1053
  • 19 Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics 1980; 36 (01) 167-171
  • 20 Dobbin KK, Simon RM. Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genomics 2011; 4 (01) 31
  • 21 James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning, volume 112. New York: Springer; 2013
  • 22 Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL. Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 2012; 5 (01) 656
  • 23 Brereton RG. Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data. TrAC Trends in Analytical Chemistry 2006; 25 (11) 1103-1111
  • 24 Liu AH, Zeiger R, Sorkness C. , et al. Development and cross-sectional validation of the Childhood Asthma Control Test. J Allergy Clin Immunol 2007; 119 (04) 817-825
  • 25 Fernandes PH, Matsumoto F, Solé D, Wandalsen GF. Translation into Portuguese and validation of the rhinitis control assessment test (RCAT) questionnaire. Rev Bras Otorrinolaringol (Engl Ed) 2016; 82 (06) 674-679
  • 26 Meltzer EO, Schatz M, Nathan R, Garris C, Stanford RH, Kosinski M. Reliability, validity, and responsiveness of the Rhinitis Control Assessment Test in patients with rhinitis. J Allergy Clin Immunol 2013; 131 (02) 379-386
  • 27 Amaral R, Carneiro AC, Wandalsen G, Fonseca JA, Sole D. Control of allergic rhinitis and asthma test for children (CARATKids): validation in brazil and cutoff values. Ann Allergy Asthma Immunol 2017; 118 (05) 551-556.e2
  • 28 Linhares DV, da Fonseca JA, Borrego LM. , et al; CARATKids study group. Validation of control of allergic rhinitis and asthma test for children (CARATKids)--a prospective multicenter study. Pediatr Allergy Immunol 2014; 25 (02) 173-179
  • 29 Emons JA, Flokstra-de Blok BM, Jong C. , et al. Use of the control of allergic rhinitis and asthma test (CARATKids) in children and adolescents; validation in Dutch. Pediatr Allergy Immunol 2017; 28 (02) 185-190
  • 30 DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44 (03) 837-845
  • 31 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357
  • 32 Adler W, Gefeller O, Gul A, Horn FK, Khan Z, Lausen B. Ensemble pruning for glaucoma detection in an unbalanced data set. Methods Inf Med 2016; 55 (06) 557-563