Summary
Objectives:
A typical bioinformatics task in microarray analysis is the classification of biological
samples into two alternative categories. A procedure is needed which, based on the
expression levels measured, allows us to compute the probability that a new sample
belongs to a certain class.
Methods:
For the purpose of classification the statistical approach of binary regression is
considered. High-dimensionality and at the same time small sample sizes make it a
challenging task. Standard logit or probit regression fails because of condition problems
and poor predictive performance. The concepts of frequentist and of Bayesian penalization
for binary regression are introduced. A Bayesian interpretation of the penalized log-likelihood
is given. Finally the role of cross-validation for regularization and feature selection
is discussed.
Results:
Penalization makes classical binary regression a suitable tool for microarray analysis.
We illustrate penalized logit and Bayesian probit regression on a well-known data
set and compare the obtained results, also with respect to published results from
decision trees.
Conclusions:
The frequentist and the Bayesian penalization concept work equally well on the example
data, however some method-specific differences can be made out. Moreover the Bayesian
approach yields a quantification (posterior probabilities) of the bias due to the
constraining assumptions.
Keywords
Bayes - bioinformatics - classification - cross-validation - logit regression - penalization
- prediction - probit regression