Abstract
Objective Preterm birth remains the predominant cause of perinatal mortality throughout the
United States and the world, with well-documented racial and socioeconomic disparities.
To develop and validate a predictive algorithm for all-cause preterm birth using clinical,
demographic, and laboratory data using machine learning.
Study Design We performed a cohort study of pregnant individuals delivering at a single institution
using prospectively collected information on clinical conditions, patient demographics,
laboratory data, and health care utilization. Our primary outcome was all-cause preterm
birth before 37 weeks. The dataset was randomly divided into a derivation cohort (70%)
and a separate validation cohort (30%). Predictor variables were selected amongst
33 that had been previously identified in the literature (directed machine learning).
In the derivation cohort, both statistical (logistic regression) and machine learning
(XG-Boost) models were used to derive the best fit (C-Statistic) and then validated
using the validation cohort. We measured model discrimination with the C-Statistic
and assessed the model performance and calibration of the model to determine whether
the model provided clinical decision-making benefits.
Results The cohort includes a total of 12,440 deliveries among 12,071 individuals. Preterm
birth occurred in 2,037 births (16.4%). The derivation cohort consisted of 8,708 (70%)
and the validation cohort consisted of 3,732 (30%). XG-Boost was chosen due to the
robustness of the model and the ability to deal with missing data and collinearity
between predictor variables. The top five predictor variables identified as drivers
of preterm birth, by feature importance metric, were multiple gestation, number of
emergency department visits in the year prior to the index pregnancy, initial unknown
body mass index, gravidity, and prior preterm delivery. Test performance characteristics
were similar between the two populations (derivation cohort area under the curve [AUC] = 0.70
vs. validation cohort AUC = 0.63).
Conclusion Clinical, demographic, and laboratory information can be useful to predict all-cause
preterm birth with moderate precision.
Key Points
-
Machine learning can be used to create models to predict preterm birth.
-
In our model, all-cause preterm birth can be predicted with moderate precision.
-
Clinical, demographic, and laboratory information can be useful to predict all-cause
preterm birth.
Keywords
preterm birth - machine learning - social determinants of health - XG- boost - predictive
algorithm