Summary
Objectives: In clinical medicine, the accuracy achieved by classification rules is often not
sufficient to justify their use in daily practice. In order to improve classifiers
it has become popular to combine single classification rules into a classification
ensemble. Two popular boosting methods will be compared with classical statistical
approaches.
Methods: Using data from a clinical study on the diagnosis of breast tumors and by simulation
we will compare AdaBoost with gradient boosting ensembles of regression trees. We
will also consider a tree approach and logistic regression as traditional competitors.
In logistic regression we allow to select nonlinear effects by the fractional polynomial
approach. Performance of the classifiers will be assessed by estimated misclassification
rates and the Brier score.
Results: We will show that boosting of simple base classifiers gives classification rules
with improved predictive ability. However, the performance of boosting classifiers
was not generally superior to the performance of logistic regression. In contrast
to the computer-intensive methods the latter are based on classifiers which are much
easier to interpret and to use.
Conclusions: In medical applications, the logistic regression model remains a method of choice
or, at least, a serious competitor of more sophisticated techniques. Refinement of
boosting methods by using optimized number of boosting steps may lead to further improvement.
Keywords
Classification - simulation study - boosting - generalized additive models - diagnosis
of breast tumors