Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso

Tobias Hepp; Matthias Schmid; Olaf Gefeller; Elisabeth Waldmann; Andreas Mayr

doi:10.3414/ME16-01-0033

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Methods Inf Med 2016; 55(05): 422-430
DOI: 10.3414/ME16-01-0033

Original Articles

Georg Thieme Verlag KG Stuttgart · New York

Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso^[*]

Tobias Hepp

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

,

Matthias Schmid

²Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany

,

Olaf Gefeller

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

,

Elisabeth Waldmann

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

,

Andreas Mayr

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

²Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany

› Author Affiliations FundingsThe work on this article was supported by the German Research Foundation (DFG), grant SCHM 2966/1–2 and the Interdisciplinary Center for Clinical Research (IZKF) of the Friedrich-Alexander-University Erlangen-Nürnberg (Project J49).

Further Information

Publication History

Received 11 March 2016

Accepted in revised form: 21 June 2016

Publication Date:
08 January 2018 (online)

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Summary

Background: Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. Objectives: Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. Methods: We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. Results: Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. Conclusions: Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.

Keywords

Penalization - lasso - regularization - boosting - variable selection - high-dimensional data

^* Supplementary material published on our web-site http://dx.doi.org/10.3414/me16-01-0033

Online Supplementary Material

Online Supplementary Material File 2

References
1 Saeys Y, Inza In, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23 (Suppl. 19) 2507-2517.

PubMed Google Scholar
2 Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B) 1996; 58: 267-288.

PubMed Google Scholar
3 Bühlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007; 22 (Suppl. 04) 477-505.

PubMed Google Scholar
4 Mayr A, Binder H, Gefeller O, Schmid M. The evolution of boosting algorithms. From machine learning to statistical modelling. Meth Inf Med 2014; 53 (Suppl. 06) 419-427.

PubMed Google Scholar
5 Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist 2000; 28 (Suppl. 02) 337-407.

PubMed Google Scholar
6 Ridgeway G. The state of boosting. Computing Science and Statistics 1999; 31: 172-181.

PubMed Google Scholar
7 Hothorn T. Boosting – an unusual yet attractive optimiser. Meth Inf Med 2014; 53 (Suppl. 06) 417-418.

PubMed Google Scholar
8 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Springer Series in Statistics. New York (NY): Springer; 2001

Google Scholar
9 Bühlmann P, Gertheiss J, Hieke S, Kneib T, Ma S, Schumacher M. et al. Discussion of The evolution of boosting algorithms and Extending statistical boosting. Meth Inf Med 2014; 53 (Suppl. 06) 436-445.

PubMed Google Scholar
10 Hoerl AE, Kennard RW. Ridge regression: applications to nonorthogonal problems. Technometrics 1970; 12: 69-82.

Crossref PubMed Google Scholar
11 Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37 (Suppl. 04) 373-384.

PubMed Google Scholar
12 Knight K, Fu W. Asymptotics for lasso-type estimators. Ann Statist 2000; 28 (Suppl. 05) 1356-1378.

PubMed Google Scholar
13 Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2004; 10 (Suppl. 06) 971-988.

PubMed Google Scholar
14 Bunea F, Tsybakov A, Wegkamp M. Sparsity oracle inequalities for the Lasso. Electron J Stat 2007; 1: 169-194.

PubMed Google Scholar
15 van de Geer SA. High-dimensional generalized linear models and the lasso. Ann Statist 2008; 36 (Suppl. 02) 614-645.

PubMed Google Scholar
16 Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. Ann Statist 2006; 34 (Suppl. 03) 1436-1462.

PubMed Google Scholar
17 Zhao P, Yu B. On Model Selection Consistency of Lasso. J Mach Learn Res 2006; 7: 2541-2563.

PubMed Google Scholar
18 Zou H. The Adaptive Lasso and Its Oracle Properties. J Am Stat Assoc 2006; 101 (Suppl. 476) 1418-1429.

PubMed Google Scholar
19 Wainwright MJ. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery UsingConstrained Quadratic Programming (Lasso). IEEE Trans Inf Theory 2009; 55 (Suppl. 05) 2183-2202.

PubMed Google Scholar
20 Zhang CH, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Statist 2008; 36 (Suppl. 04) 1567-1594.

PubMed Google Scholar
21 Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist 2009; 37 (Suppl. 01) 246-270.

PubMed Google Scholar
22 Zhang T, Yu B. Boosting with early stopping: Convergence and consistency. Ann Statist 2005; 33 (Suppl. 04) 1538-1579.

PubMed Google Scholar
23 Bühlmann P. Boosting for high-dimensional linear models. Ann Statist 2006; 34 (Suppl. 02) 559-583.

PubMed Google Scholar
24 Mayr A, Hofner B, Schmid M. The importance of knowing when to stop – a sequential stopping rule for component-wise gradient boosting. Meth Inf Med 2012; 51 (Suppl. 02) 178-186.

PubMed Google Scholar
25 Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist 2004; 32: 407-499.

PubMed Google Scholar
26 Meinshausen N, Rocha G, Yu B. Discussion: a tale of three cousins: Lasso, l2boosting and Dantzig. Ann Statist 2007; 35 (Suppl. 06) 2373-2384.

PubMed Google Scholar
27 Duan J, Soussen C, Brie D, Idier J, Wang YP. On lars/homotopy equivalence conditions for over-determined lasso. IEEE Signal Process Lett 2012; 19 (Suppl. 12) 894-897.

PubMed Google Scholar
28 Hastie T, Taylor J, Tibshirani R, Walther G. Forward stagewise regression and the monotone lasso. Electro J Stat 2007; 1: 1-29.

PubMed Google Scholar
29 Binder H, Schumacher M. Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Stat Appl Genet Mol Biol 2008; 7 (Suppl. 01) 1-28.

PubMed Google Scholar
30 R Core Team.. R: A Language and Environment for Statistical Computing. Vienna, Austria: 2014 Available from: http://www.R-project.org.

PubMed Google Scholar
31 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010; 33 (Suppl. 01) 1-22.

Crossref PubMed Google Scholar
32 Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting. 2015 R package version R package version 2.5–0. Available from: http://CRAN.R-project.org/pack-age=mboost.

PubMed Google Scholar
33 Scheipl F, Kneib T, Fahrmeir L. Penalized likelihood and Bayesian function selection in regression models. Advances in Statistical Analysis 2013; 97 (Suppl. 04) 349-385.

PubMed Google Scholar
34 Wang Z, Wang C. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Stat Appl Genet Mol Biol 2010; 9 (Suppl. 01) 1-33.

PubMed Google Scholar
35 Friedman J. Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29 (Suppl. 05) 1189-1232.

PubMed Google Scholar
36 Bühlmann P, Yu B. Sparse Boosting. J Mach Learn Res 2006; 7: 1001-1024.

PubMed Google Scholar
37 Seibold H, Bernau C, Boulesteix AL, Bin RD. On the choice and influence of the number of boosting steps. 2016 Available from: http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub26724–1.

PubMed Google Scholar
38 Harris N, Sepehri A. The Accessible Lasso Models. 2015 Available from: http://arxiv.org/abs/1501.02559.

PubMed Google Scholar
39 Fenske N, Kneib T, Hothorn T. Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. J Am Stat Assoc 2011; 106 (Suppl. 494) 494-510.

PubMed Google Scholar
40 Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 2005; 21 (Suppl. 24) 4356-4362.

PubMed Google Scholar
41 Schmid M, Hothorn T. Boosting additive models using component-wise P-splines. Comput Stat Data Anal 2008; 53: 298-311.

PubMed Google Scholar
42 Sobotka F, Kneib T. Geoadditive expectile regression. Comput Stat Data Anal 2012; 56: 755-767.

PubMed Google Scholar
43 Hofner B, Kneib T, Hothorn T. A unified framework of constrained regression. Stat Comput 2014; 26 (Suppl. 01) 1-14.

PubMed Google Scholar
44 Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics 2009; 65 (Suppl. 02) 626-634.

PubMed Google Scholar
45 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.

PubMed Google Scholar
46 Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society (Series B) 2006; 68 (Suppl. 01) 49-67.

PubMed Google Scholar
47 Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society (Series B) 2005; 67 (Suppl. 02) 301-320.

PubMed Google Scholar
48 Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist 2007; 35 (Suppl. 06) 2313-2351.

PubMed Google Scholar
49 Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008; 9 (Suppl. 03) 432-441.

PubMed Google Scholar
50 Gertheiss J, Hogger S, Oberhauser C, Tutz G. Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets. Applied Statistics 2010; 60 (Suppl. 03) 377-395.

PubMed Google Scholar
51 Tutz G, Gertheiss J. Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression. J Comput Graph Stat 2010; 19: 154-174.

PubMed Google Scholar
52 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011; 7 (Suppl. 01) 1-30.

PubMed Google Scholar
53 Mayr A, Binder H, Gefeller O, Schmid M. Extending statistical boosting: an overview of recent methodological developments. Meth Inf Med 2014; 53 (Suppl. 06) 428-435.

PubMed Google Scholar

Supplementary Material

Subscribe to RSS

Share / Bookmark

Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso[*]

Publication History

Correction to:

Summary

Keywords

References

Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso^[*]