Methods Inf Med 2022; 61(01/02): 019-028
DOI: 10.1055/s-0042-1742672
Original Article

A Comparison of Methods to Detect Changes in Prediction Models

Erin M. Schnellinger
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Wei Yang
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Michael O. Harhay
1   Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
,
Stephen E. Kimmel
2   Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, United States
› Institutsangaben
Funding S.K. was supported by US National Institutes of Health, grant number NIH R01HL14129. E.S. was supported by US National Institutes of Health (NIH) National Heart, Lung, and Blood Institute (NHLBI), grant number F31HL19433. M.H. was supported by US National Institutes of Health (grant number: R00HL141678) and Patient-Centered Outcomes Research Institute (grant number: ME-2020C1-19220). W.Y. was supported by US National Institutes of Health (NIH) National Heart, Lung, and Blood Institute (NHLBI), grant number: R01-HL141294.

Abstract

Background Prediction models inform decisions in many areas of medicine. Most models are fitted once and then applied to new (future) patients, despite the fact that model coefficients can vary over time due to changes in patients' clinical characteristics and disease risk. However, the optimal method to detect changes in model parameters has not been rigorously assessed.

Methods We simulated data, informed by post–lung transplant mortality data and tested the following two approaches for detecting model change: (1) the “Direct Approach,” it compares coefficients of the model refit on recent data to those at baseline; and (2) “Calibration Regression,” it fits a logistic regression model of the log-odds of the observed outcomes versus the linear predictor from the baseline model (i.e., the log-odds of the predicted probabilities obtained from the baseline model) and tests whether the intercept and slope differ from 0 and 1, respectively. Four scenarios were simulated using logistic regression for binary outcomes as follows: (1) we fixed all model parameters, (2) we varied the outcome prevalence between 0.1 and 0.2, (3) we varied the coefficient of one of the ten predictors between 0.2 and 0.4, and (4) we varied the outcome prevalence and coefficient of one predictor simultaneously.

Results Calibration regression tended to detect changes sooner than the Direct Approach, with better performance (e.g., larger proportion of true claims). When the sample size was large, both methods performed well. When two parameters changed simultaneously, neither method performed well.

Conclusion Neither change detection method examined here proved optimal under all circumstances. However, our results suggest that if one is interested in detecting a change in overall incidence of an outcome (e.g., intercept), the Calibration Regression method may be superior to the Direct Approach. Conversely, if one is interested in detecting a change in other model covariates (e.g., slope), the Direct Approach may be superior.

Data and Code

This simulation study was informed by data from the United Network for Organ Sharing (UNOS). The authors do not have the authority to share UNOS data; researchers interested in accessing this data must submit a request to UNOS directly. The code used to conduct the simulations is available from the authors on request.


Authors' Contributions

All authors contributed to the study's conception and design. E.M.S. performed the statistical analysis, drafted the initial manuscript, and revised it based on the critical review and scientific input of W.Y., M.O.H., and S.E.K. All authors contributed to the design of the simulations, programmed by E.M.S. under the guidance of W.Y., M.O.H., and S.E.K.


Supplementary Material



Publikationsverlauf

Eingereicht: 20. Oktober 2021

Angenommen: 31. Dezember 2021

Artikel online veröffentlicht:
12. Februar 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Egan TM, Murray S, Bustami RT. et al. Development of the new lung allocation system in the United States. Am J Transplant 2006; 6 (5, pt. 2): 1212-1227
  • 2 O'Brien SM, Shahian DM, Filardo G. et al; Society of Thoracic Surgeons Quality Measurement Task Force. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2–isolated valve surgery. Ann Thorac Surg 2009; 88 (1, suppl): S23-S42
  • 3 Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today's critically ill patients. Crit Care Med 2006; 34 (05) 1297-1310
  • 4 Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Switzerland: Springer; 2010
  • 5 Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014; 35 (29) 1925-1931
  • 6 Cox D. Two further applications of a model for binary regression. Biometrika 1958; 3/4: 562-565
  • 7 Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 2016; 74: 167-176
  • 8 Nattino G, Finazzi S, Bertolini G. A new calibration test and a reappraisal of the calibration belt for the assessment of prediction models based on dichotomous outcomes. Stat Med 2014; 33 (14) 2390-2407
  • 9 Jenkins DA, Sperrin M, Martin GP, Peek N. Dynamic models to predict health outcomes: current status and methodological challenges. Diagn Progn Res 2018; 2: 23
  • 10 Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. J Clin Epidemiol 2015; 68 (02) 134-143
  • 11 Gottlieb J. Lung allocation. J Thorac Dis 2017; 9 (08) 2670-2674
  • 12 Egan TM, Edwards LB. Effect of the lung allocation score on lung transplantation in the United States. J Heart Lung Transplant 2016; 35 (04) 433-439
  • 13 Hoopes CW, Kukreja J, Golden J, Davenport DL, Diaz-Guzman E, Zwischenberger JB. Extracorporeal membrane oxygenation as a bridge to pulmonary transplantation. J Thorac Cardiovasc Surg 2013; 145 (03) 862-867 , discussion 867–868
  • 14 Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol 2011; 11: 13
  • 15 Vergouwe Y, Nieboer D, Oostenbrink R. et al. A closed testing procedure to select an appropriate method for updating prediction models. Stat Med 2017; 36 (28) 4529-4539
  • 16 Organ Procurement and Transplantation Network. Organ procurement and transplantation network (OPTN) Policies. Accessed 10 March 2020: https://www.hhs.gov/guidance/sites/default/files/hhs-guidance-documents/optn_policies.pdf
  • 17 Miller ME, Langefeld CD, Tierney WM, Hui SL, McDonald CJ. Validation of probabilistic predictions. Med Decis Making 1993; 13 (01) 49-58
  • 18 McCormick TH, Raftery AE, Madigan D, Burd RS. Dynamic logistic regression and dynamic model averaging for binary classification. Biometrics 2012; 68 (01) 23-30
  • 19 Hickey GL, Grant SW, Caiado C. et al. Dynamic prediction modeling approaches for cardiac surgery. Circ Cardiovasc Qual Outcomes 2013; 6 (06) 649-658
  • 20 Booth S, Riley RD, Ensor J, Lambert PC, Rutherford MJ. Temporal recalibration for improving prognostic model development and risk predictions in settings where survival is improving over time. Int J Epidemiol 2020; 49 (04) 1316-1325
  • 21 Raftery AE, Kárný M, Ettler P. Online prediction under model uncertainty via dynamic model averaging: application to a cold rolling mill. Technometrics 2010; 52 (01) 52-66
  • 22 Finkelman BS, French B, Kimmel SE. The prediction accuracy of dynamic mixed-effects models in clustered data. BioData Min 2016; 9: 5
  • 23 Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 2004; 23 (16) 2567-2586
  • 24 Wyatt JC, Altman DG. Commentary: prognostic models: clinically useful or quickly forgotten?. BMJ 1995; 311: 1539
  • 25 Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol 2008; 61 (11) 1085-1094
  • 26 Brehaut JC, Stiell IG, Graham ID. Will a new clinical decision rule be widely used? The case of the Canadian C-spine rule. Acad Emerg Med 2006; 13 (04) 413-420
  • 27 McCradden MD, Joshi S, Mazwi M, Anderson JA. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health 2020; 2 (05) e221-e223