Methods Inf Med 2012; 51(02): 162-167
DOI: 10.3414/ME11-02-0020
Focus Theme – Original Articles
Schattauer GmbH

Multi-class HingeBoost[*]

Method and Application to the Classification of Cancer Types Using Gene Expression Data
Z. Wang
1   Department of Research, Connecticut Children’s Medical Center, Department of Pediatrics, University of Connecticut School of Medicine, Hartford, Connecticut, USA
› Author Affiliations
Further Information

Publication History

received:24 June 2011

accepted:27 March 2011

Publication Date:
19 January 2018 (online)

Summary

Background: Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data.

Objectives: This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary case to the multi-class case without reducing the original problem to multiple binary problems.

Methods: Minimizing a multi-class hinge loss with boosting technique, the proposed Hinge-Boost has good theoretical properties by implementing the Bayes decision rule and providing a unifying framework with either equal or unequal misclassification costs. Furthermore, we propose Twin HingeBoost which has better feature selection behavior than Hinge-Boost by reducing the number of ineffective covariates. Simulated data, benchmark data and two cancer gene expression data sets are utilized to evaluate the performance of the proposed approach.

Results: Simulations and the benchmark data showed that the multi-class HingeBoost generated accurate predictions when compared with the alternative methods, especially with high-dimensional covariates. The multi-class Hinge-Boost also produced more accurate prediction or comparable prediction in two cancer classification problems using gene expression data.

Conclusions: This work has shown that the HingeBoost provides a powerful tool for multi-classification problems. In many applications, the classification accuracy and feature selection behavior can be further improved when using Twin HingeBoost.

* Supplementary material published on our website www.methods-online.com.


 
  • References

  • 1 Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M. et al Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 2001; 98 (026) 15149.
  • 2 Yeang CH, Ramaswamy S, Tamayo P, Mukherjee S, Rifkin RM, Angelo M. et al Molecular classification of multiple tumor types. Bioinformatics 2001; 17 (Suppl. 01) (suppl) S316-S322.
  • 3 Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F. et al Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 2001; 7 (06) 673-679.
  • 4 Kuhn KA, Knoll A, Mewes H, Schwaiger M, Bode A, Broy M. et al Informatics and Medicine - From Molecules to Populations. Methods Inf Med 2008; 47 (04) 283-295.
  • 5 Vapnik VN. Statistical Learning Theory. Wiley 1998
  • 6 Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 1997; 55 (01) 119-139.
  • 7 Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research 2001; 1: 113-141.
  • 8 Lee Y, Lin Y, Wahba G. Multicategory support vector machines: Theory and Applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 2004; 99 0465 67-81.
  • 9 Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion). Annals of Statistics 2000; 28 (02) 337-407.
  • 10 Zou H, Zhu J, Hastie T. New multicategory boosting algorithms based on multicategory Fisher-consistent losses. The Annals of Applied Statistics 2008; 2 (04) 1290-1306.
  • 11 Lin Y, Lee Y, Wahba G. Support vector machines for classification in nonstandard situations. Machine Learning 2002; 46 (01) (03) 191-202.
  • 12 Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with discussion). Statistical Science 2007; 22 (04) 477-505.
  • 13 Stollhoff R, Sauerbrei W, Schumacher M. An experimental evaluation of boosting methods for classification. Methods Inf Med 2010; 49 (03) 219-229.
  • 14 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011 7. 01 Article 13. http://www.bepress.com/ijb/vol7/iss1/13.
  • 15 Bühlmann P, Hothorn T. Twin Boosting: improved feature selection and prediction. Statistics and Computing 2010; 20: 119-138.
  • 16 Li P. ABC-boost: Adaptive base class boost for multi-class classification. In: Proceedings of the 26th International Conference On Machine Learning, ICML 2009 2009; pp 625-632.
  • 17 Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A. e1071: Misc Functions of the Department of Statistics (e1071), TU Wien; 2011. R package version 1.5–27. Available from http://CRAN.project.org/package=e1071.
  • 18 Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 2010; 33: 1-22.
  • 19 Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Machine learning 1999; 37 (03) 297-336.
  • 20 Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. 2nd ed. New York: Springer-Verlag; 2009.
  • 21 Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 2002; 97 0457 77-87.
  • 22 Wang Z. bst: Gradient Boosting; 2011. R package version 0. 3-1. Available from http://CRAN.R-project.org/package=bst