Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICUNational Library of Medicine Training Grant 5T15LM007450-13.
12 September 2015
accepted: 02 January 2016
16 December 2017 (online)
Discharging patients from the Neonatal Intensive Care Unit (NICU) can be delayed for non-medical reasons including the procurement of home medical equipment, parental education, and the need for children’s services. We previously created a model to identify patients that will be medically ready for discharge in the subsequent 2–10 days. In this study we use Natural Language Processing to improve upon that model and discern why the model performed poorly on certain patients.
We retrospectively examined the text of the Assessment and Plan section from daily progress notes of 4,693 patients (103,206 patient-days) from the NICU of a large, academic children’s hospital. A matrix was constructed using words from NICU notes (single words and bigrams) to train a supervised machine learning algorithm to determine the most important words differentiating poorly performing patients compared to well performing patients in our original discharge prediction model.
NLP using a bag of words (BOW) analysis revealed several cohorts that performed poorly in our original model. These included patients with surgical diagnoses, pulmonary hypertension, retinopathy of prematurity, and psychosocial issues.
The BOW approach aided in cohort discovery and will allow further refinement of our original discharge model prediction. Adequately identifying patients discharged home on g-tube feeds alone could improve the AUC of our original model by 0.02. Additionally, this approach identified social issues as a major cause for delayed discharge.
A BOW analysis provides a method to improve and refine our NICU discharge prediction model and could potentially avoid over 900 (0.9%) hospital days.
AUC – Area under the Curve, CART -- Classification And Regression Trees, DTD – Days to Dis- charge, GI – Gastrointestinal, LOS – Length of Stay, NICU – Neonatal Intensive Care Unit, NS – Neurosurgery, RF – Random Forest.
- 1 Bockli K, Andrews B, Pellerite M, Meadow W. Trends and challenges in United States neonatal intensive care units follow-up clinics. Journal of perinatology : official journal of the California Perinatal Association 2014; 34 (01) 71-74.
- 2 Challis D, Hughes J, Xie C, Jolley D. An examination of factors influencing delayed discharge of older people from hospital. International journal of geriatric psychiatry 2014; 29 (02) 160-168.
- 3 Temple MW, Lehmann CU, Fabbri D. Predicting Discharge Dates From the NICU Using Progress Note Data. Pediatrics 2015; 136 (02) e395-405.
- 4 Manktelow BN, Seaton SE, Field DJ, Draper ES. Population-based estimates of in-unit survival for very preterm infants. Pediatrics 2013; 131 (02) e425-e432.
- 5 Draper ES, Manktelow B, Field DJ, James D. Prediction of survival for preterm births by weight and gestational age: retrospective population based study. Bmj 1999; 319 7217 1093-1097.
- 6 Hintz SR, Bann CM, Ambalavanan N, Cotten CM, Das A, Higgins RD. et al. Predicting time to hospital discharge for extremely preterm infants. Pediatrics 2010; 125 (01) e146-e154.
- 7 Yang H, Spasic I, Keane JA, Nenadic G. A text mining approach to the prediction of disease status from clinical discharge summaries. Journal of the American Medical Informatics Association: JAMIA 2009; 16 (04) 596-600.
- 8 Yang H. Automatic extraction of medication information from medical discharge summaries. Journal of the American Medical Informatics Association: JAMIA 2010; 17 (05) 545-548.
- 9 Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC. et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association: JAMIA 2011; 18 (05) 601-606.
- 10 Wright A, McCoy AB, Henkin S, Kale A, Sittig DF. Use of a support vector machine for categorizing freetext notes: assessment of accuracy across two institutions. Journal of the American Medical Informatics Association: JAMIA 2013; 20 (05) 887-890.
- 11 Cui L, Bozorgi A, Lhatoo SD, Zhang GQ, Sahoo SS. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. AMIA Annual Symposium pro ceedings / AMIA Symposium AMIA Symposium 2012; 2012: 1191-1200.
- 12 Bejan CA, Vanderwende L, Evans HL, Wurfel MM, Yetisgen-Yildiz M. On-time clinical phenotype prediction based on narrative reports. AMIA Annual Symposium proceedings/AMIA Symposium AMIA Symposium 2013; 2013: 103-110.
- 13 Wu ST, Juhn YJ, Sohn S, Liu H. Patient-level temporal aggregation for text-based asthma status ascertainment. Journal of the American Medical Informatics Association: JAMIA 2014; 21 (05) 876-884.
- 14 Ludvigsson JF, Pathak J, Murphy S, Durski M, Kirsch PS, Chute CG. et al. Use of computerized algorithm to identify individuals in need of testing for celiac disease. Journal of the American Medical Informatics Association JAMIA 2013; 20 e2 e306-e310.
- 15 Connolly B, Matykiewicz P, Bretonnel KCohen, Standridge SM, Glauser TA, Dlugos DJ. et al. Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals. Journal of the American Medical Informatics Association: JAMIA 2014; 21 (05) 866-870.
- 16 Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S. et al. Secondary use of clinical data: The Vanderbilt approach. Journal of biomedical informatics 2014; 52 (00) 28-35.
- 17 http://www.nltk.org
- 18 http://scikit-learn.org/stable/index.html
- 19 Wang J, Du L, Cai W, Pan W, Yan W. Prolonged feeding difficulties after surgical correction of intestinal atresia: a 13-year experience. Journal of pediatric surgery 2014; 49 (11) 1593-1597.
- 20 Garg R, Agthe AG, Donohue PK, Lehmann CU. Hyperglycemia and retinopathy of prematurity in very low birth weight infants. Journal of perinatology: official journal of the California Perinatal Association 2003; 23 (03) 186-194.
- 21 Chavez-Valdez R, McGowan J, Cannon E, Lehmann CU. Contribution of early glycemic status in the development of severe retinopathy of prematurity in a cohort of ELBW infants. Journal of perinatology: official journal of the California Perinatal Association 2011; 31 (12) 749-756.
- 22 Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. 2011; 2011-09–01 00:00:00 540-3.