Appl Clin Inform 2021; 12(04): 808-815
DOI: 10.1055/s-0041-1735184
Review Article

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Lin Lawrence Guo
1  Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada
,
Stephen R. Pfohl
2  Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
,
Jason Fries
2  Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
,
Jose Posada
2  Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
,
Scott Lanyon Fleming
2  Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
,
Catherine Aftandilian
4  Division of Pediatric Hematology/Oncology, Stanford University, Palo Alto, United States
,
Nigam Shah
2  Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
,
Lillian Sung
1  Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada
3  Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Canada
› Author Affiliations
Funding None.

Abstract

Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts.

Methods Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects.

Results Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination.

Conclusion There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

Note

L.S. is the Canada Research Chair in Pediatric Oncology Supportive Care.


Author Contributions

L.L.G. and L.S. supported in data acquisition and data analysis. All authors helped in study concepts and design, and data interpretation; involved in drafting the manuscript or revising it critically for important intellectual content; carried out the final approval of version to be published; and granted agreement to be accountable for all aspects of the work.


Protection of Human and Animal Subjects

As this study is a systematic review of primary studies, human and/or animal subjects were not included in the project.


Supplementary Material



Publication History

Received: 28 April 2021

Accepted: 12 July 2021

Publication Date:
01 September 2021 (online)

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany