Introduction: Machine learning (ML)-based predictive models are increasingly prevalent in neurosurgery,
though most require resource-intensive discrete variable collection. Natural language
processing (NLP) allows one to extract meaningful information from large quantities
of unstructured free text in a relatively simple manner. Here, we create an NLP-based
model that utilizes preoperative notes and radiology reports to predict nonhome discharge.
We then present a web-based, point-of-care implementation that can be used to make
real-time predictions.
Methods: We retrospectively reviewed 1,291 adults treated for intracranial meningioma at two
academic centers from 1995 to 2015. Age and text from preoperative notes and radiology
reports were collected. Text was represented via term frequency-inverse document frequency
(TF-IDF). Thirty-two ML algorithms were trained to predict nonhome discharge from
the database. Top performing algorithms were combined to form an ensemble model, which
was then validated on data excluded from initial training.
Area under the curve (AUC) was calculated for internal and holdout validation. Permutation
importance was used to determine the relative impact of each input on predictions.
Word clouds were generated to visualize which words best predict nonhome discharge.
Nonnegative matrix factorization (NMF) was used to model topics, or related collections
of words that occur within the notes of patients with nonhome discharge. Finally,
a public website was built to house the completed model.
Results: Among 1,291 patients, 987 (76.5%) were discharged to home and 394 (23.5%) were discharged
elsewhere. Mean age was 56.9 years, though patients with nonhome disposition were
significantly older (63.6 vs. 54.8 years, p ≤ 0.001). A model comprising a regularized logistic regression, support vector machine,
and gradient boosted machine best predicted nonhome discharge, with an AUC = 0.78
(95% CI = 0.74–0.81) on internal validation and 0.76 on holdout validation (Fig. 1).
Preoperative notes most influenced predictions, followed by imaging reports, then
age. Words including “large,” “ventricle,” and “extremity” were predictive of nonhome
discharge (Fig. 2). Topics identified using NMF are illustrated in Table 1. The public
web address housing the model is http://nlp-home.insds.org. Patient age and text from the preoperative note and radiology report are entered,
and the model returns odds of nonhome discharge.
Conclusion: The ML is an expanding but still underutilized analytic resource in neurosurgery.
Here, we use NLP to construct a multiinstitutional model that predicts nonhome discharge.
To our knowledge, this represents the first NLP-based predictive model and point-of-care
ML application in the field.
Fig. 1 Area under the curve for internal (A) and holdout (B) validation of the natural language processing model.
AUC = 0.78 (95% CI = 0.74–0.81).
Fig. 2 Word clouds.
Word clouds demonstrating the importance of words to the model from the (A) preoperative note and (B) radiology report. Red denotes association with nonhome discharge; blue denotes association
with home discharge. Size is proportional to how influential the word is in either
direction.
Table 1 Topic modeling