Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Joshua Lemmon; Lin Lawrence Guo; Jose Posada; Stephen R. Pfohl; Jason Fries; Scott Lanyon Fleming; Catherine Aftandilian; Nigam Shah; Lillian Sung

doi:10.1055/s-0043-1762904

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2023; 62(01/02): 060-070
DOI: 10.1055/s-0043-1762904

Original Article

Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Authors

Joshua Lemmon

¹Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada
Lin Lawrence Guo

¹Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada
Jose Posada

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States

³Department of Systems Engineering, Universidad del Norte, Barranquilla, Atlantico, Colombia
Stephen R. Pfohl

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Jason Fries

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Scott Lanyon Fleming

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Catherine Aftandilian

⁴Division of Pediatric Hematology/Oncology, Stanford University, Palo Alto, California, United States
Nigam Shah

²Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
Lillian Sung

¹Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada

⁵Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Ontario, Canada

Funding L.S. is supported by the Canada Research Chair in Pediatric Oncology Supportive Care.

Further Information

Permissions and Reprints

Abstract

Background Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

Methods Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008–2010, 2011–2013, 2014–2016, and 2017–2019). We trained baseline models using L2-regularized logistic regression on 2008–2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008–2010) and improve OOD performance (2017–2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

Results The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017–2019 data using features selected from training on 2008–2010 data generally reached parity with oracle models trained directly on 2017–2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

Conclusions While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

Keywords

dataset shift - machine learning - clinical outcomes - feature selection

Consent for Publication

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from PhysioNet but restrictions apply to the availability of these data, which were used under license for the current study, and thus not publicly available. Data are however available from the authors upon reasonable request and with permission of PhysioNet.

Ethical Approval Statement

The institutional review boards of Beth Israel Deaconess Medical Center, Boston, Massachusetts and the Massachusetts Institute of Technology, Cambridge, Massachusetts, United States waived the need for ethics approval and consequently participant informed consent due to the deidentification of patient records. All methods were performed in accordance with relevant guidelines and regulations.

Authors' Contributions

L.L.G. and L.S. designed the project with input from all authors. J.P. suggested the use of causal inference models. J.L. performed all experiments. J.L., L.L.G., and L.S. analyzed and interpreted results, with some input from all other authors. J.L. wrote the manuscript with major contributions from L.L.G. and L.S. All authors revised and commented on the manuscript. All authors read and approved the final manuscript.

Supplementary Material

Supplementary Material (PDF)

Publication History

Received: 02 September 2022

Accepted: 04 January 2023

Article published online:
22 February 2023

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F. A unifying view on dataset shift in classification. Pattern Recognit 2012; 45 (01) 521-530

Reference Link Ris
Crossref PubMed Search in Google Scholar
2 Guo LL, Pfohl SR, Fries J. et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl Clin Inform 2021; 12 (04) 808-815

Reference Link Ris
Thieme Connect PubMed Search in Google Scholar
3 Davis SE, Greevy Jr RA, Fonnesbeck C, Lasko TA, Walsh CG, Matheny ME. A nonparametric updating method to correct clinical prediction model drift. J Am Med Inform Assoc 2019; 26 (12) 1448-1457

Reference Link Ris
Crossref PubMed Search in Google Scholar
4 Siregar S, Nieboer D, Versteegh MIM, Steyerberg EW, Takkenberg JJM. Methods for updating a risk prediction model for cardiac surgery: a statistical primer. Interact Cardiovasc Thorac Surg 2019; 28 (03) 333-338

Reference Link Ris
Crossref PubMed Search in Google Scholar
5 Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014; 40 (01) 16-28

Reference Link Ris
Crossref PubMed Search in Google Scholar
6 Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. MIMIC-IV (version 1.0). PhysioNet. 2021. Accessed January 27, 2023 at: https://doi.org/10.13026/s6n6-xd98

Reference Link Ris
Crossref PubMed
7 Goldberger AL, Amaral LA, Glass L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 2000; 101 (23) E215-E220

Reference Link Ris
Crossref PubMed Search in Google Scholar
8 Singer M, Deutschman CS, Seymour CW. et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016; 315 (08) 801-810

Reference Ris Wihthout Link

Search in Google Scholar
9 Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; 58 (01) 267-288

Reference Link Ris
Crossref PubMed Search in Google Scholar
10 Hooker S, Erhan D, Kindermans P-J, Kim B. A benchmark for interpretability methods in deep neural networks. arXiv preprint arXiv:180610758. 2018. Accessed January 27, 2023 at: https://doi.org/10.48550/arXiv.1806.10758

Reference Link Ris
Crossref PubMed
11 Yu K, Guo X, Liu L. et al. Causality-based feature selection: methods and evaluations. ACM Comput Surv 2020; 53 (05) 1-36 (CSUR)

Reference Link Ris
Crossref PubMed Search in Google Scholar
12 Tsamardinos I, Aliferis CF. Towards principled feature selection: relevancy, filters and wrappers. Paper presented at: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics; Key West, Florida, United States, January 3–6, 2003; Proceedings of Machine Learning Research

Reference Link Ris
PubMed
13 Tsamardinos I, Aliferis CF, Statnikov AR, Statnikov E. Algorithms for large scale Markov blanket discovery. Paper presented at: FLAIRS Conference; St. Augustine, Florida, United States, May 12–14, 2003

Reference Link Ris
PubMed
14 Pena JM, Nilsson R, Björkegren J, Tegnér J. Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 2007; 45 (02) 211-232

Reference Link Ris
Crossref PubMed Search in Google Scholar
15 De Morais SR, Aussem A. A novel scalable and data efficient feature subset selection algorithm. Paper presented at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Antwerp, Belgium, September 15–19, 2008

Reference Link Ris
PubMed
16 Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J Mach Learn Res 2010; 11 (07) 171-234

Reference Link Ris
PubMed Search in Google Scholar
17 Hassan A, Paik JH, Khare S, Hassan SA. PPFS: predictive permutation feature selection. arXiv preprint arXiv:211010713. 2021. Accessed January 27, 2023 at: https://doi.org/10.48550/arXiv.2110.10713

Reference Link Ris
Crossref PubMed
18 Austin PC, Steyerberg EW. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Stat Med 2019; 38 (21) 4051-4065

Reference Link Ris
Crossref PubMed Search in Google Scholar
19 Debray TP, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KG. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68 (03) 279-289

Reference Link Ris
Crossref PubMed Search in Google Scholar
20 Guo LL, Pfohl SR, Fries J. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci Rep 2022; 12 (01) 2726

Reference Link Ris
Crossref PubMed Search in Google Scholar
21 Guo LL, Steinberg E, Fleming SL. et al. EHR foundation models improve robustness in the presence of temporal distribution shift. medRxiv 2022. Accessed January 27, 2023 at: https://www.medrxiv.org/content/10.1101/2022.04.15.22273900v1

Reference Link Ris
PubMed
22 Cawley GC. Causal & non-causal feature selection for ridge regression. Paper presented at: Causation and Prediction Challenge; Hong Kong, June 1–6, 2008

Reference Link Ris
PubMed
23 Zhang X, Hu Y, Xie K, Wang S, Ngai E, Liu M. A causal feature selection algorithm for stock prediction modeling. Neurocomputing 2014; 142: 48-59

Reference Link Ris
Crossref PubMed Search in Google Scholar

Supplementary Material

Supplementary Material (PDF)

Related Journals

Subscribe to RSS

Share / Bookmark

Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Authors

Abstract

Keywords

Consent for Publication

Data Availability Statement

Ethical Approval Statement

Authors' Contributions

Supplementary Material

Publication History

References