Subscribe to RSS
DOI: 10.1055/a-2618-4470
Refining a Machine Learning Model for Predicting Infant Sepsis: A Multidisciplinary Team Supported by Human-Centered Design Methods
Authors
Funding This study was funded by the Foundation for the National Institutes of Health (fund no.: 1R01LM013526-01A1).

Abstract
Background
Human-centered design (HCD) methods in machine learning generally focus on workflow, user interfaces, and data visualizations, but there is the potential to apply these methods to inform the model development and testing process.
Objectives
This study aimed to demonstrate the potential of HCD methods to support the design and testing of machine learning models developed for clinical decision-making.
Methods
In preparing for formative user testing of clinician facing representations of a machine learning model for detecting sepsis in neonatal intensive care unit (NICU) patients, we discovered that interactive low fidelity mockups using real patient data revealed potential model anomalies. To further investigate these potential anomalies, we utilized the qualitative analysis of interviews with 31 NICU clinicians concerning their experience with neonatal sepsis. The review process was conducted by a multidisciplinary team with members having expertise in neonatology, informatics, data science, and human computer interaction (HCI). Anomalies identified via the mockups and interview analysis were further analyzed by inspections of patient charts and model features and code.
Results
The HCD-facilitated review revealed anomalies in three categories: (1) feature inclusion and exclusion, (2) feature importance, and (3) model stability over time. Data entry errors in the electronic health record and their impact on model output were also noted. The review resulted in 41 changes to the model.
Conclusion
The discovery of over 41 opportunities to improve our prediction model was a serendipitous by-product of the HCD process. Our results suggest that HCD can be applied not only to model display design and measures of explainability, but to the development and evaluation of the model itself. This case report also demonstrates the need for a multidisciplinary team of clinicians, data scientists, and HCI experts in identifying and addressing issues involving machine learning model performance.
Keywords
artificial intelligence - clinical decision support - interfaces and usability - machine learning - neonatologyProtection of Human and Animal Subjects
The study was determined to be exempt from human studies by the Children's Hospital of Philadelphia, Institutional Review Board (IRB 21-018777).
Publication History
Received: 30 December 2024
Accepted: 21 May 2025
Article published online:
10 October 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Zhang J, Walji MF. TURF: toward a unified framework of EHR usability. J Biomed Inform 2011; 44 (06) 1056-1067
- 2 Margetis G, Ntoa S, Antona M, Stephanidis C. Human-centered design of artificial intelligence. In: Salvendy G, Karwowski W. eds. Handbook of Human Factors And Ergonomics. 1st ed.. Wiley; 2021: 1085-1106
- 3 Gillies M, Fiebrink R, Tanaka A. et al. Human-centered machine learning. In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM; 2016: 3558-3565
- 4 Wu X, Xiao L, Sun Y, Zhang J, Ma T, He L. A survey of human-in-the-loop for machine learning. Future Gener Comput Syst 2022; 135: 364-381
- 5 Mosqueira-Rey E, Hernández-Pereira E, Alonso-Ríos D, Bobes-Bascarán J, Fernández-Leal Á. Human-in-the-loop machine learning: a state of the art. Artif Intell Rev 2023; 56 (04) 3005-3054
- 6 Holzinger A. Interactive machine learning for health informatics: when do we need the human-in-the-loop?. Brain Inform 2016; 3 (02) 119-131
- 7 Gennatas ED, Friedman JH, Ungar LH. et al. Expert-augmented machine learning. Proc Natl Acad Sci U S A 2020; 117 (09) 4571-4577
- 8 Valdes G, Luna JM, Eaton E, Simone II CB, Ungar LH, Solberg TD. MediBoost: a patient stratification tool for interpretable decision making in the era of precision medicine. Sci Rep 2016; 6: 37854
- 9 Zea-Vera A, Ochoa TJ. Challenges in the diagnosis and management of neonatal sepsis. J Trop Pediatr 2015; 61 (01) 1-13
- 10 Wang H, Bhutta ZA, Coates MM. et al; GBD 2015 Child Mortality Collaborators. Global, regional, national, and selected subnational levels of stillbirths, neonatal, infant, and under-5 mortality, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 2016; 388 (10053): 1725-1774
- 11 Li J, Xiang L, Chen X. et al. Global, regional, and national burden of neonatal sepsis and other neonatal infections, 1990-2019: findings from the Global Burden of Disease Study 2019. Eur J Pediatr 2023; 182 (05) 2335-2343
- 12 Teng AK, Wilcox AB. A review of predictive analytics solutions for sepsis patients. Appl Clin Inform 2020; 11 (03) 387-398
- 13 Stoll BJ, Hansen NI, Sánchez PJ. et al; Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network. Early onset neonatal sepsis: the burden of group B Streptococcal and E. coli disease continues. Pediatrics 2011; 127 (05) 817-826
- 14 Qazi SA, Stoll BJ. Neonatal sepsis: a major global public health challenge. Pediatr Infect Dis J 2009; 28 (1, Suppl): S1-S2
- 15 Srinivasan L, Harris MC. New technologies for the rapid diagnosis of neonatal sepsis. Curr Opin Pediatr 2012; 24 (02) 165-171
- 16 Wynn JL. Defining neonatal sepsis. Curr Opin Pediatr 2016; 28 (02) 135-140
- 17 Chen Q, Li R, Lin C. et al. SEPRES: intensive care unit clinical data integration system to predict sepsis. Appl Clin Inform 2023; 14 (01) 65-75
- 18 Dewan M, Vidrine R, Zackoff M. et al. Design, Implementation, and Validation of a Pediatric ICU sepsis prediction tool as clinical decision support. Appl Clin Inform 2020; 11 (02) 218-225
- 19 Kitzmiller RR, Vaughan A, Skeeles-Worley A. et al. Diffusing an innovation: clinician perceptions of continuous predictive analytics monitoring in intensive care. Appl Clin Inform 2019; 10 (02) 295-306
- 20 Ginestra JC, Giannini HM, Schweickert WD. et al. Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock. Crit Care Med 2019; 47 (11) 1477-1484
- 21 Masino AJ, Harris MC, Forsyth D. et al. Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data. PLoS One 2019; 14 (02) e0212665
- 22 Cao L, Masina A, Unger L. et al. From Single Predictor to Two-Stage Screening Pipeline: Advancing NICU Sepsis Prediction for Improved Clinical Outcomes. 2023
- 23 Karavite DJ, Harris MC, Grundmeier RW, Srinivasan L, Shaeffer GP, Muthu N. Using a sociotechnical model to understand challenges with sepsis recognition among critically ill infants. ACI Open 2022; 06 (02) e57-e65
- 24 Russ AL, Saleem JJ. Ten factors to consider when developing usability scenarios and tasks for health information technology. J Biomed Inform 2018; 78: 123-133
- 25 Barda AJ, Horvat CM, Hochheiser H. A qualitative research framework for the design of user-centered displays of explanations for machine learning model predictions in healthcare. BMC Med Inform Decis Mak 2020; 20 (01) 257
- 26 Norrie C. Explainable AI Techniques for Sepsis Diagnosis: Evaluating LIME and SHAP through a User Study; 2021. Accessed April 19, 2024 at: https://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19845
- 27 Bates DW, Auerbach A, Schulam P, Wright A, Saria S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann Intern Med 2020; 172 (11, Suppl): S137-S144
- 28 Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019; 366 (6464) 447-453
- 29 Combi C, Amico B, Bellazzi R. et al. A manifesto on explainability for artificial intelligence in medicine. Artif Intell Med 2022; 133: 102423
- 30 Langer M, Oster D, Speith T. et al. What do we want from explainable artificial intelligence (XAI)?–A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif Intell 2021; 296: 103473
- 31 Capel T, Brereton M. What is human-centered about human-centered AI? A map of the research landscape. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM; 2023: 1-23
- 32 Shneiderman B. Human-centered artificial intelligence: reliable, safe & trustworthy. Int J Hum Comput Interact 2020; 36 (06) 495-504
- 33 Wang L, Zhang Z, Wang D. et al. Human-centered design and evaluation of AI-empowered clinical decision support systems: a systematic review. Front Comput Sci 2023; 5: 1187299
- 34 Wright MC, Borbolla D, Waller RG. et al. Critical care information display approaches and design frameworks: a systematic review and meta-analysis. J Biomed Inform X 2019; 3: 100041
- 35 Wysocki O, Davies JK, Vigo M. et al. Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. Artif Intell 2023; 316: 103839
- 36 Albahri AS, Duhaim AM, Fadhel MA. et al. A systematic review of trustworthy and explainable artificial intelligence in healthcare: assessment of quality, bias risk, and data fusion. Inf Fusion 2023; 96: 156-191
- 37 Fiebrink R, Gillies M. Introduction to the special issue on human-centered machine learning. ACM Trans Interact Intell Syst 2018; 8 (02) 1-7
- 38 Sacha D, Sedlmair M, Zhang L. et al. What you see is what you can change: human-centered machine learning by interactive visualization. Neurocomputing 2017; 268: 164-175
- 39 Dudley JJ, Kristensson PO. A review of user interface design for interactive machine learning. ACM Trans Interact Intell Syst 2018; 8 (02) 1-37
- 40 Sperrle F, El-Assady M, Guo G. et al. A survey of human-centered evaluations in human-centered machine learning. Comput Graph Forum 2021; 40 (03) 543-568
- 41 Hannon D, Rantanen E, Sawyer B. et al. A human factors engineering education perspective on data science, machine learning and automation. Proc Hum Factors Ergon Soc Annu Meet 2019; 63 (01) 488-492