Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set

Werner Adler; Olaf Gefeller; Asma Gul; Folkert K. Horn; Zardad Khan; Berthold Lausen

doi:10.3414/ME16-01-0055

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook Linkedin Weibo

PDF herunterladen

Methods Inf Med 2016; 55(06): 557-563
DOI: 10.3414/ME16-01-0055

Original Articles

Schattauer GmbH

Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set^[*]

Werner Adler

¹Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander University Erlangen-Nuremberg, Erlangen, Germany

,

Olaf Gefeller

¹Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander University Erlangen-Nuremberg, Erlangen, Germany

,

Asma Gul

²Department of Statistics, Shaheed Benazir Bhutto Women University, Peshawar, Pakistan

,

Folkert K. Horn

³Department of Ophthalmology, Friedrich-Alexander University Erlangen-Nuremberg, Erlangen, Germany

,

Zardad Khan

⁴Department of Statistics, Abdul Wali Khan University, Mardan, Pakistan

,

Berthold Lausen

⁵Department of Mathematical Sciences, University of Essex, Colchester, UK

› Institutsangaben
Funding The work on this article was supported by the German Research Foundation (DFG), grant SCHM 2966 / 1– 2. We acknowledge support from grant number ES / L011859 / 1, from The Business and Local Government Data Research Centre, funded by the Economic and Social Research Council to provide researchers and analysts with secure data services.

Weitere Informationen

Publikationsverlauf

received: 29. April 2016

accepted in revised form: 28. August 2016

Publikationsdatum:
08. Januar 2018 (online)

Abstract
Volltext
Referenzen
Zusatzmaterial

Lizenzen und Reprints

Summary

Background: Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased perfor -mance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges.

Objectives: We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation.

Methods: The data set consists of 102 topo-graphical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma.

Results: In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies.

Conclusions: The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of perfor -mance in a population with increased risk of glaucoma.

Keywords

Ensemble pruning - glaucoma - random forest - unbalanced data

^* Supplementary material published on our website http://dx.doi.org/10.3414/ME16-01-0055

References
1 Dietterich TG. Ensemble methods in machine learning. In: Kittler J, Roli F. editors. Multiple classifier systems. Berlin, Heidelberg: Springer; 2000. p. 1-15.

MissingFormLabel
Suche in Google Scholar
2 Mayr A, Binder H, Gefeller O, Schmid M. The evolution of boosting algorithms. From machine learning to statistical modelling. Methods Inf Med 2014; 53 (Suppl. 06) 419-427.

MissingFormLabel
Thieme Connect PubMed Suche in Google Scholar
3 Opitz D, Maclin R. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 1999; 11: 169-198.

MissingFormLabel
PubMed Suche in Google Scholar
4 Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems. Cambridge: MIT Press; 1995

MissingFormLabel
Suche in Google Scholar
5 Breiman L. Bagging predictors. Machine Learning 1996; 24 (Suppl. 02) 123-140.

MissingFormLabel
PubMed Suche in Google Scholar
6 Breiman L. Random Forests. Machine Learning 2001; 45 (Suppl. 01) 5-32.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
7 Oshiro T, Perez P, Baranauskas J. How many trees in a random forest?. Machine Learning and Data Mining in Pattern Recognition 2012; 154-168.

MissingFormLabel
PubMed Suche in Google Scholar
8 Margineantu DD, Dietterich TG. Pruning adaptive boosting. ICML 1997; 211-218.

MissingFormLabel
PubMed Suche in Google Scholar
9 Kulkarni VY, Sinha PK. Pruning of random forest classifiers: a survey and future directions. Data Science & Engineering (ICDSE), 2012 International Conference on 2012; 64-68.

MissingFormLabel
PubMed Suche in Google Scholar
10 Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer. In: Okun O, Valentini G. editors. Applications of supervised and unsupervised ensemble methods. Berlin, Heidelberg: Springer; 2009. p. 1-13.

MissingFormLabel
Suche in Google Scholar
11 Bhowan U, Johnston M, Zhang M. Ensemble learning and pruning in multi-objective genetic programming for classification with unbalanced data. In: Wang D, Reynolds M. editors. Proceedings of the 24th international conference on Advances in Artificial Intelligence (AI’11). Berlin, Heidelberg: Springer; 2011. p. 192-202.

MissingFormLabel
Suche in Google Scholar
12 Zhang Y, Burer S, Street WN. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research 2006; 7: 1315-1338.

MissingFormLabel
PubMed Suche in Google Scholar
13 Zhang L, Zhou WD. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition 2011; 44 (Suppl. 01) 97-106.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
14 Zhang J, Chau KW. Multilayer Ensemble Pruning via Novel Multi-sub-swarm Particle Swarm Optimization. Journal of Universal Computer Science 2009; 15 (Suppl. 04) 840-858.

MissingFormLabel
PubMed Suche in Google Scholar
15 Gatnar E. A diversity measure for tree-based classifier ensembles. In: Baier D, Decker R, Schmidt-Thieme L. editors. Data Analysis and Decision Support. Berlin, Heidelberg: Springer; 2005. p. 30-38.

MissingFormLabel
Crossref Suche in Google Scholar
16 Kuncheva LI, Whitaker CJ. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 2003; 51 (Suppl. 02) 181-207.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
17 Miglio R, Soffritti G. The comparison between classification trees through proximity measures. Computational Statistics & Data Analysis 2004; 45 (Suppl. 03) 577-593.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
18 Norsida H, Bakri M, Norwati M, Rizam ABM. Similarity measure exercise for classification trees based on the classification path. Applied Mathematics and Computational Intelligence 2012; 1: 33-41.

MissingFormLabel
PubMed Suche in Google Scholar
19 Tang EK, Suganthan PN, Yao X. An analysis of diversity measures. Machine Learning 2006; 65 (Suppl. 01) 247-271.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
20 Brown G, Kuncheva LI. “Good” and “bad” diversity in majority vote ensembles. In: Kittler J, Roli F. editors. Multiple classifier systems. Berlin, Heidelberg: Springer; 2000. p. 124-133.

MissingFormLabel
Suche in Google Scholar
21 Gul A, Perperoglou A, Khan Z, Mahmoud O, Miftahuddin M, Adler W. et al. Ensemble of a subset of kNN classifiers. Advances in Data Analysis and Classification 2016; 1-14.

MissingFormLabel
PubMed Suche in Google Scholar
22 Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W. et al. An ensemble of optimal trees for class membership probability estimation. Analysis of Large and Complex Data, European Conference on Data Analysis. July 2014. Bremen: Springer; in press.

MissingFormLabel
Suche in Google Scholar
23 Partalas I, Tsoumakas G, Vlahavas I. An ensemble uncertainty aware measure for directed hill climbing ensemble pruning. Machine Learning 2010; 81: 257-282.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
24 Yang F, Lu W, Luo L, Li T. Margin optimization based pruning for random forest. Neurocomputing 2012; 94: 54-63.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
25 Adler W, Peters A, Lausen B. Comparison of classifiers applied to confocal scanning laser ophthalmoscopy data. Methods Inf Med 2008; 47: 38-46.

MissingFormLabel
Thieme Connect PubMed Suche in Google Scholar
26 Horn FK, Lämmer R, Mardin CY, Jünemann AG, Michelson G, Lausen B. et al. Combined evaluation of frequency doubling technology perimetry and scanning laser ophthalmoscopy for glaucoma detection using automated classification. Journal of Glaucoma 2012; 21 (Suppl. 01) 27-34.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
27 Bowd C, Hao J, Tavares IM, Medeiros FA, Zang-will LM, Lee TW. et al. Bayesian machine learning classifiers for combining structural and functional measurements to classify healthy and glaucoma-tous eyes. Investigative Ophthalmology & Visual Science 2008; 49 (Suppl. 03) 945-953.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
28 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 2002; 16: 321-357.

MissingFormLabel
PubMed Suche in Google Scholar
29 Wallace BC, Dahabreh IJ. Improving class probability estimates for imbalanced data. Knowledge and Information Systems 2014; 41 (Suppl. 01) 33-52.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
30 R Core Team.. R: A language and environment for statistical computing. R Foundation for Statistical Computing 2015 Vienna, Austria.:

MissingFormLabel
PubMed Suche in Google Scholar
31 Giacinto G, Roli F, Fumera G. Design of effective multiple classifier systems by clustering of classifiers. 15th International Conference on Pattern Recognition, ICPR 2000; 160-163.

MissingFormLabel
PubMed Suche in Google Scholar
32 Lu Z, Wu X, Zhu X, Bongard J. Ensemble Pruning via Individual Contribution Ordering. Proceedings of the 16th ACM SIgKDD international conference on Knowledge discovery and data mining. 2010

MissingFormLabel
PubMed Suche in Google Scholar
33 Martinez-Munoz G, Hernandez-Lobato D, Suarez A. An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation. IEEE Trans Pattern Anal Mach Intell 2009; 31 (Suppl. 02) 245-259.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
34 Tham Y, Li X, Wong T, Quigley H, Aung T, Cheng C. Global Prevalence of Glaucoma and Projections of Glaucoma Burden through 2040. Ophthalmology 2014; 121 (Suppl. 11) 2081-2090.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
35 Todd A, Müller A, Rait J, Keeffe J, Taylor H, Mukesh B. Performance of Community-Based Glaucoma Screening Using Frequency Doubling Technology and Heidelberg Retinal Tomography. Ophthalmic Epidemiology 2005; 12 (Suppl. 03) 167-178.

MissingFormLabel
Crossref PubMed Suche in Google Scholar
36 Schwing AG, Zach C, Zheng Y, Pollefeys M. Adaptive random forest – How many “experts” to ask before making a decision?. Computer Vision and Pattern Recognition (CVPR) 2011 IEEE Conference on 2011; 1377-1384.

MissingFormLabel
PubMed Suche in Google Scholar
37 Krawczyk B, Schaefer G. An improved ensemble approach for imbalanced classification problems. Applied Computational Intelligence and Informatica (SACI), 2013 IEEE 8th International Symposium on. 2013; 423-426.

MissingFormLabel
PubMed Suche in Google Scholar
38 Krawczyk B, Wozniak M, Schaefer G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing 2014; 14: 554-562.

MissingFormLabel
Crossref PubMed Suche in Google Scholar

Zusatzmaterial