Methods Inf Med 2016; 55(05): 422-430
DOI: 10.3414/ME16-01-0033
Original Articles
Georg Thieme Verlag KG Stuttgart · New York

Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso[*]

Tobias Hepp
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Matthias Schmid
2   Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
,
Olaf Gefeller
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Elisabeth Waldmann
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
Andreas Mayr
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
2   Institut für medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
› Author Affiliations
FundingsThe work on this article was supported by the German Research Foundation (DFG), grant SCHM 2966/1–2 and the Interdisciplinary Center for Clinical Research (IZKF) of the Friedrich-Alexander-University Erlangen-Nürnberg (Project J49).
Further Information

Publication History

Received 11 March 2016

Accepted in revised form: 21 June 2016

Publication Date:
08 January 2018 (online)

Summary

Background: Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. Objectives: Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. Methods: We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. Results: Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. Conclusions: Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.

* Supplementary material published on our web-site http://dx.doi.org/10.3414/me16-01-0033


 
  • References

  • 1 Saeys Y, Inza In, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23 (Suppl. 19) 2507-2517.
  • 2 Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B) 1996; 58: 267-288.
  • 3 Bühlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007; 22 (Suppl. 04) 477-505.
  • 4 Mayr A, Binder H, Gefeller O, Schmid M. The evolution of boosting algorithms. From machine learning to statistical modelling. Meth Inf Med 2014; 53 (Suppl. 06) 419-427.
  • 5 Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist 2000; 28 (Suppl. 02) 337-407.
  • 6 Ridgeway G. The state of boosting. Computing Science and Statistics 1999; 31: 172-181.
  • 7 Hothorn T. Boosting – an unusual yet attractive optimiser. Meth Inf Med 2014; 53 (Suppl. 06) 417-418.
  • 8 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Springer Series in Statistics. New York (NY): Springer; 2001
  • 9 Bühlmann P, Gertheiss J, Hieke S, Kneib T, Ma S, Schumacher M. et al. Discussion of The evolution of boosting algorithms and Extending statistical boosting. Meth Inf Med 2014; 53 (Suppl. 06) 436-445.
  • 10 Hoerl AE, Kennard RW. Ridge regression: applications to nonorthogonal problems. Technometrics 1970; 12: 69-82.
  • 11 Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37 (Suppl. 04) 373-384.
  • 12 Knight K, Fu W. Asymptotics for lasso-type estimators. Ann Statist 2000; 28 (Suppl. 05) 1356-1378.
  • 13 Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2004; 10 (Suppl. 06) 971-988.
  • 14 Bunea F, Tsybakov A, Wegkamp M. Sparsity oracle inequalities for the Lasso. Electron J Stat 2007; 1: 169-194.
  • 15 van de Geer SA. High-dimensional generalized linear models and the lasso. Ann Statist 2008; 36 (Suppl. 02) 614-645.
  • 16 Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. Ann Statist 2006; 34 (Suppl. 03) 1436-1462.
  • 17 Zhao P, Yu B. On Model Selection Consistency of Lasso. J Mach Learn Res 2006; 7: 2541-2563.
  • 18 Zou H. The Adaptive Lasso and Its Oracle Properties. J Am Stat Assoc 2006; 101 (Suppl. 476) 1418-1429.
  • 19 Wainwright MJ. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery UsingConstrained Quadratic Programming (Lasso). IEEE Trans Inf Theory 2009; 55 (Suppl. 05) 2183-2202.
  • 20 Zhang CH, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Statist 2008; 36 (Suppl. 04) 1567-1594.
  • 21 Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist 2009; 37 (Suppl. 01) 246-270.
  • 22 Zhang T, Yu B. Boosting with early stopping: Convergence and consistency. Ann Statist 2005; 33 (Suppl. 04) 1538-1579.
  • 23 Bühlmann P. Boosting for high-dimensional linear models. Ann Statist 2006; 34 (Suppl. 02) 559-583.
  • 24 Mayr A, Hofner B, Schmid M. The importance of knowing when to stop – a sequential stopping rule for component-wise gradient boosting. Meth Inf Med 2012; 51 (Suppl. 02) 178-186.
  • 25 Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist 2004; 32: 407-499.
  • 26 Meinshausen N, Rocha G, Yu B. Discussion: a tale of three cousins: Lasso, l2boosting and Dantzig. Ann Statist 2007; 35 (Suppl. 06) 2373-2384.
  • 27 Duan J, Soussen C, Brie D, Idier J, Wang YP. On lars/homotopy equivalence conditions for over-determined lasso. IEEE Signal Process Lett 2012; 19 (Suppl. 12) 894-897.
  • 28 Hastie T, Taylor J, Tibshirani R, Walther G. Forward stagewise regression and the monotone lasso. Electro J Stat 2007; 1: 1-29.
  • 29 Binder H, Schumacher M. Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Stat Appl Genet Mol Biol 2008; 7 (Suppl. 01) 1-28.
  • 30 R Core Team.. R: A Language and Environment for Statistical Computing. Vienna, Austria: 2014 Available from: http://www.R-project.org.
  • 31 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010; 33 (Suppl. 01) 1-22.
  • 32 Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting. 2015 R package version R package version 2.5–0. Available from: http://CRAN.R-project.org/pack-age=mboost.
  • 33 Scheipl F, Kneib T, Fahrmeir L. Penalized likelihood and Bayesian function selection in regression models. Advances in Statistical Analysis 2013; 97 (Suppl. 04) 349-385.
  • 34 Wang Z, Wang C. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Stat Appl Genet Mol Biol 2010; 9 (Suppl. 01) 1-33.
  • 35 Friedman J. Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29 (Suppl. 05) 1189-1232.
  • 36 Bühlmann P, Yu B. Sparse Boosting. J Mach Learn Res 2006; 7: 1001-1024.
  • 37 Seibold H, Bernau C, Boulesteix AL, Bin RD. On the choice and influence of the number of boosting steps. 2016 Available from: http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub26724–1.
  • 38 Harris N, Sepehri A. The Accessible Lasso Models. 2015 Available from: http://arxiv.org/abs/1501.02559.
  • 39 Fenske N, Kneib T, Hothorn T. Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. J Am Stat Assoc 2011; 106 (Suppl. 494) 494-510.
  • 40 Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 2005; 21 (Suppl. 24) 4356-4362.
  • 41 Schmid M, Hothorn T. Boosting additive models using component-wise P-splines. Comput Stat Data Anal 2008; 53: 298-311.
  • 42 Sobotka F, Kneib T. Geoadditive expectile regression. Comput Stat Data Anal 2012; 56: 755-767.
  • 43 Hofner B, Kneib T, Hothorn T. A unified framework of constrained regression. Stat Comput 2014; 26 (Suppl. 01) 1-14.
  • 44 Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics 2009; 65 (Suppl. 02) 626-634.
  • 45 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.
  • 46 Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society (Series B) 2006; 68 (Suppl. 01) 49-67.
  • 47 Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society (Series B) 2005; 67 (Suppl. 02) 301-320.
  • 48 Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist 2007; 35 (Suppl. 06) 2313-2351.
  • 49 Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008; 9 (Suppl. 03) 432-441.
  • 50 Gertheiss J, Hogger S, Oberhauser C, Tutz G. Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets. Applied Statistics 2010; 60 (Suppl. 03) 377-395.
  • 51 Tutz G, Gertheiss J. Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression. J Comput Graph Stat 2010; 19: 154-174.
  • 52 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011; 7 (Suppl. 01) 1-30.
  • 53 Mayr A, Binder H, Gefeller O, Schmid M. Extending statistical boosting: an overview of recent methodological developments. Meth Inf Med 2014; 53 (Suppl. 06) 428-435.