Methods Inf Med 2012; 51(02): 162-167
DOI: 10.3414/ME11-02-0020
Focus Theme – Original Articles
Schattauer GmbH

Multi-class HingeBoost[*]

Method and Application to the Classification of Cancer Types Using Gene Expression Data
Z. Wang
1   Department of Research, Connecticut Children’s Medical Center, Department of Pediatrics, University of Connecticut School of Medicine, Hartford, Connecticut, USA
› Author Affiliations
Further Information

Publication History

received:24 June 2011

accepted:27 March 2011

Publication Date:
19 January 2018 (online)

Preview

Summary

Background: Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data.

Objectives: This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary case to the multi-class case without reducing the original problem to multiple binary problems.

Methods: Minimizing a multi-class hinge loss with boosting technique, the proposed Hinge-Boost has good theoretical properties by implementing the Bayes decision rule and providing a unifying framework with either equal or unequal misclassification costs. Furthermore, we propose Twin HingeBoost which has better feature selection behavior than Hinge-Boost by reducing the number of ineffective covariates. Simulated data, benchmark data and two cancer gene expression data sets are utilized to evaluate the performance of the proposed approach.

Results: Simulations and the benchmark data showed that the multi-class HingeBoost generated accurate predictions when compared with the alternative methods, especially with high-dimensional covariates. The multi-class Hinge-Boost also produced more accurate prediction or comparable prediction in two cancer classification problems using gene expression data.

Conclusions: This work has shown that the HingeBoost provides a powerful tool for multi-classification problems. In many applications, the classification accuracy and feature selection behavior can be further improved when using Twin HingeBoost.

* Supplementary material published on our website www.methods-online.com.