Penalized Binary Regression for Gene Expression Profiling

Michae G. Schimek

doi:10.1055/s-0038-1633894

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2004; 43(05): 439-444
DOI: 10.1055/s-0038-1633894

Original Article

Schattauer GmbH

Penalized Binary Regression for Gene Expression Profiling

Authors

Michae G. Schimek

¹Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria

Further Information

Publication History

Publication Date:
05 February 2018 (online)

Permissions and Reprints

Summary

Objectives: A typical bioinformatics task in microarray analysis is the classification of biological samples into two alternative categories. A procedure is needed which, based on the expression levels measured, allows us to compute the probability that a new sample belongs to a certain class.

Methods: For the purpose of classification the statistical approach of binary regression is considered. High-dimensionality and at the same time small sample sizes make it a challenging task. Standard logit or probit regression fails because of condition problems and poor predictive performance. The concepts of frequentist and of Bayesian penalization for binary regression are introduced. A Bayesian interpretation of the penalized log-likelihood is given. Finally the role of cross-validation for regularization and feature selection is discussed.

Results: Penalization makes classical binary regression a suitable tool for microarray analysis. We illustrate penalized logit and Bayesian probit regression on a well-known data set and compare the obtained results, also with respect to published results from decision trees.

Conclusions: The frequentist and the Bayesian penalization concept work equally well on the example data, however some method-specific differences can be made out. Moreover the Bayesian approach yields a quantification (posterior probabilities) of the bias due to the constraining assumptions.

Keywords

Bayes - bioinformatics - classification - cross-validation - logit regression - penalization - prediction - probit regression

References
1 Breiman L. Heuristics of instability and stabilization in model selection. Annal Statist 1996; 24: 2350-83.

Crossref Search in Google Scholar
Download RIS citation
2 le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. Appl Statist 1992; 41: 191-201.

Crossref Search in Google Scholar
Download RIS citation
3 Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics 2003; 19: 1061-9.

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Donoho D, Johnstone I. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994; 81: 425-55.

Crossref Search in Google Scholar
Download RIS citation
5 Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Statist Assoc 2002; 97: 77-87.

Crossref Search in Google Scholar
Download RIS citation
6 Eilers PHC. et al. Classification of microarray data with penalized logistic regression. Proceedings of SPIE 2001; 4266: 187-98.

Search in Google Scholar
Download RIS citation
7 Finney D. Statistical Method in Biological Assay. New York: Hafner; 1973. 2nd edition.

Search in Google Scholar
Download RIS citation
8 Friedman JH. Regularized discriminant analysis. J Amer Statist Assoc 1989; 84: 165-75.

Crossref Search in Google Scholar
Download RIS citation
9 Gentleman R. et al. The Bioconductor FAQ. 2004. http://www.bioconductor.org/

Search in Google Scholar
Download RIS citation
10 Girosi F, Jones M, Poggio T. Regularization theory and neural networks architecture. Neural Computation 1995; 7: 219-69.

Crossref Search in Google Scholar
Download RIS citation
11 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. New York: Springer-Verlag; 2001

Search in Google Scholar
Download RIS citation
12 Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12: 55-67.

Crossref Search in Google Scholar
Download RIS citation
13 Hornik K. et al. The R FAQ. 2004. http://www.r-project.org/

Search in Google Scholar
Download RIS citation
14 van Houwelingen JC, le Cessie S. Predictive value of statistical models. Statistics in Medicine 1990; 9: 1303-25.

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Jonathan P, Krzanowski WJ, McCarthy WV. On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing 2000; 10: 209-29.

Crossref Search in Google Scholar
Download RIS citation
16 van der Linde A. A note on smoothing splines as Bayesian estimates. Statistics and Decisions 1993; 11: 61-7.

Search in Google Scholar
Download RIS citation
17 McCullagh P, Nelder JA. Generalized Linear Models. London: Chapman & Hall; 1989. 2nd edition.

Search in Google Scholar
Download RIS citation
18 Park P, Pagano M, Bonetti M. A nonparametric scoring algorithm for identifying informative genes from microarray data. Pac Symp Biocomput 2001; 6: 52-63.

Search in Google Scholar
Download RIS citation
19 Schimek MG. A roughness penalty approach for statistical graphics. In Edwards D, Raun NE. (eds) Proceedings in Computational Statistics 1988. Heidelberg: Physica-Verlag; 1988: 37-43.

Search in Google Scholar
Download RIS citation
20 Spang R. et al. Prediction and uncertainty in the analysis of gene expression profiles. Silico Biol. 2002: 2 http://www.bioinfo.de/isb/gcb01/talks/spang/index.html

Search in Google Scholar
Download RIS citation
21 Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Statist Soc 1995; B 57: 267-88.

Search in Google Scholar
Download RIS citation
22 Tibshirani R, Efron B. Pre-validation and inference in microarrays. Statistical Applications in Genetics and Molecular Biology 2002; 1 article 1 http://www.bepress.com/sagmb/vol1/iss1/art1

Search in Google Scholar
Download RIS citation
23 West M. et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Nat Academy Scien 2001; 98: 11462-7.

Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Penalized Binary Regression for Gene Expression Profiling

Authors

Publication History

Summary

Keywords

References