Subscribe to RSS
Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning ModelFunding None.
Objective The purpose of this study is to evaluate the ability of three metrics to monitor for a reduction in performance of a chronic kidney disease (CKD) model deployed at a pediatric hospital.
Methods The CKD risk model estimates a patient's risk of developing CKD 3 to 12 months following an inpatient admission. The model was developed on a retrospective dataset of 4,879 admissions from 2014 to 2018, then run silently on 1,270 admissions from April to October, 2019. Three metrics were used to monitor its performance during the silent phase: (1) standardized mean differences (SMDs); (2) performance of a “membership model”; and (3) response distribution analysis. Observed patient outcomes for the 1,270 admissions were used to calculate prospective model performance and the ability of the three metrics to detect performance changes.
Results The deployed model had an area under the receiver-operator curve (AUROC) of 0.63 in the prospective evaluation, which was a significant decrease from an AUROC of 0.76 on retrospective data (p = 0.033). Among the three metrics, SMDs were significantly different for 66/75 (88%) of the model's input variables (p <0.05) between retrospective and deployment data. The membership model was able to discriminate between the two settings (AUROC = 0.71, p <0.0001) and the response distributions were significantly different (p <0.0001) for the two settings.
Conclusion This study suggests that the three metrics examined could provide early indication of performance deterioration in deployed models' performance.
Projection of Human and Animal Subjects
This project was reviewed and approved by the Stanford University Institutional Review Board.
Received: 13 September 2021
Accepted: 01 March 2022
04 May 2022 (online)
© 2022. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
- 1 Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med 2019; 380 (14) 1347-1358
- 2 Bates DW, Auerbach A, Schulam P, Wright A, Saria S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann Intern Med 2020; 172 (11) S137-S144
- 3 Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019; 28 (03) 231-237
- 4 Wong A, Otles E, Donnelly JP. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021; 181 (08) 1065-1070
- 5 Sendak MP, Balu S, Schulman KA. Barriers to achieving economies of scale in analysis of EHR data. a cautionary tale. Appl Clin Inform 2017; 8 (03) 826-831
- 6 Pencina MJ, Goldstein BA, D'Agostino RB. Prediction models—development, evaluation, and clinical application. N Engl J Med 2020; 382 (17) 1583-1586
- 7 Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999; 130 (06) 515-524
- 8 Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc 2017; 24 (06) 1052-1061
- 9 Bedoya AD, Futoma J, Clement ME. et al. Machine learning for early detection of sepsis: an internal and temporal validation study. JAMIA Open 2020; 3 (02) 252-260
- 10 Nestor B, McDermott MBA, Boag W. et al. Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. MLHC. Published online 2019. Accessed June 7, 2021 at: https://www.semanticscholar.org/paper/dcbf6137fe16b33c2e2d9258bd4a1e3cdabee48f
- 11 Moons KGM, Kengne AP, Grobbee DE. et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98 (09) 691-698
- 12 Finlayson SG, Subbaswamy A, Singh K. et al. The clinician and dataset shift in artificial intelligence. N Engl J Med 2021; 385 (03) 283-286
- 13 Rabanser S, Günnemann S, Lipton ZC. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. arXiv. Published online October 29, 2018. Accessed June 1, 2021 at: http://arxiv.org/abs/1810.11953
- 14 Bernardi L, Mavridis T, Estevez P. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com. Paper presented at: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '19. Association for Computing Machinery; 2019: 1743-1751
- 15 Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68 (03) 279-289
- 16 Kaddourah A, Basu RK, Bagshaw SM, Goldstein SL. AWARE Investigators. Epidemiology of acute kidney injury in critically ill children and young adults. N Engl J Med 2017; 376 (01) 11-20
- 17 Coca SG, Singanamala S, Parikh CR. Chronic kidney disease after acute kidney injury: a systematic review and meta-analysis. Kidney Int 2012; 81 (05) 442-448
- 18 Silver SA, Goldstein SL, Harel Z. et al. Ambulatory care after acute kidney injury: an opportunity to improve patient outcomes. Can J Kidney Health Dis 2015; 2: 36
- 19 Kaspar CDW, Bholah R, Bunchman TE. A review of pediatric chronic kidney disease. Blood Purif 2016; 41 (1-3): 211-217
- 20 Hogg RJ, Furth S, Lemley KV. et al; National Kidney Foundation's Kidney Disease Outcomes Quality Initiative. National Kidney Foundation's Kidney Disease Outcomes Quality Initiative clinical practice guidelines for chronic kidney disease in children and adolescents: evaluation, classification, and stratification. Pediatrics 2003; 111 (6 Pt 1): 1416-1421
- 21 Goldstein SL, Jaber BL, Faubel S, Chawla LS. Acute Kidney Injury Advisory Group of American Society of Nephrology. AKI transition of care: a potential opportunity to detect and prevent CKD. Clin J Am Soc Nephrol 2013; 8 (03) 476-483
- 22 Glenn D, Ocegueda S, Nazareth M. et al. The global pediatric nephrology workforce: a survey of the International Pediatric Nephrology Association. BMC Nephrol 2016; 17 (01) 83
- 23 Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract 2012; 120 (04) c179-c184
- 24 Schwartz GJ, Haycock GB, Edelmann Jr CM, Spitzer A. A simple estimate of glomerular filtration rate in children derived from body length and plasma creatinine. Pediatrics 1976; 58 (02) 259-263
- 25 Faraone SV. Interpreting estimates of treatment effects: implications for managed care. P&T 2008; 33 (12) 700-711
- 26 Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011; 46 (03) 399-424
- 27 Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 2020; 21 (02) 345-352
- 28 Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ 1995; 310 (6973): 170
- 29 Lu JH, Callahan A, Patel BS, Morse KE, Dash D, Shah NH. Low adherence to existing model reporting guidelines by commonly used clinical prediction models. bioRxiv DOI: 10.1101/2021.07.21.21260282.
- 30 Breck E, Cai S, Nielsen E, Salib M, Sculley D. The ML test score: A rubric for ML production readiness and technical debt reduction. Paper presented at: 2017 IEEE International Conference on Big Data (Big Data); 2017: 1123-1132
- 31 Altman DG, Royston P. What do we mean by validating a prognostic model?. Stat Med 2000; 19 (04) 453-473
- 32 Massengill SF, Ferris M. Chronic kidney disease in children and adolescents. Pediatr Rev 2014; 35 (01) 16-29
- 33 Pencina MJ, D'Agostino Sr RB, D'Agostino Jr RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27 (02) 157-172
- 34 Mišić VV, Rajaram K, Gabel E. A simulation-based evaluation of machine learning models for clinical decision support: application and analysis using hospital readmission. NPJ Digit Med 2021; 4 (01) 98
- 35 Sethi SK, Bunchman T, Chakraborty R, Raina R. Pediatric acute kidney injury: new advances in the last decade. Kidney Res Clin Pract 2021; 40 (01) 40-51
- 36 Goldstein SL, Kirkendall E, Nguyen H. et al. Electronic health record identification of nephrotoxin exposure and associated acute kidney injury. Pediatrics 2013; 132 (03) e756-e767
- 37 Wang L, McGregor TL, Jones DP. et al. Electronic health record-based predictive models for acute kidney injury screening in pediatric inpatients. Pediatr Res 2017; 82 (03) 465-473
- 38 Goldstein SL, Mottes T, Simpson K. et al. A sustained quality improvement program reduces nephrotoxic medication-associated acute kidney injury. Kidney Int 2016; 90 (01) 212-221
- 39 Goldstein SL, Dahale D, Kirkendall ES. et al. A prospective multi-center quality improvement initiative (NINJA) indicates a reduction in nephrotoxic acute kidney injury in hospitalized children. Kidney Int 2020; 97 (03) 580-588
- 40 Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int Suppl 2013; 3: 1-150