Methods Inf Med 2016; 55(05): 422-430
DOI: 10.3414/ME16-01-0033
Original Articles
Schattauer GmbH

Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso[*]

Tobias Hepp
1  Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Matthias Schmid
2  Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
,
Olaf Gefeller
1  Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Elisabeth Waldmann
1  Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Andreas Mayr
1  Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
2  Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
› Author Affiliations
FundingsThe work on this article was supported by the German Research Foundation (DFG), grant SCHM 2966/1–2 and the Interdisciplinary Center for Clinical Research (IZKF) of the Friedrich-Alexander-University Erlangen-Nürnberg (Project J49).
Further Information

Publication History

Received 11 March 2016

Accepted in revised form: 21 June 2016

Publication Date:
08 January 2018 (online)

Summary

Background: Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. Objectives: Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. Methods: We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. Results: Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. Conclusions: Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.

* Supplementary material published on our web-site http://dx.doi.org/10.3414/me16-01-0033