Keywords
acute postoperative pain - chronic opioid use - electronic health records - wearable
device data - machine learning
Background and Significance
Background and Significance
Postoperative pain is among the most common types of acute pain.[1] Surgical procedures require effective postoperative pain management as a basic requirement.
Ineffective pain management may lead to higher mortality,[2]
[3] a longer recovery time, and higher hospital costs.[3] Postoperative pain management is vital to patient care, affecting immediate recovery
and long-term outcomes. Per the 2019 National Health Statistics Reports, approximately
4.5 percent of patients in the United States use opioids for acute and chronic pain
management.[3] Opioids have historically been the mainstay of acute and chronic pain management.
Yet, the widespread use of opioids has contributed to an opioid epidemic, highlighting
the urgent need for personalized pain management techniques.
Predicting postoperative pain and opioid use can help with tailoring pain management
plans, providing early interventions to high-risk patients, individualized patient
education, resource allocation, and opioid stewardship. However, it is challenging
to estimate pain severity after surgery since conventional pain treatment depends
on subjective evaluations. Recent studies show that the use of machine learning (ML)
models might predict pain and opioid requirements following surgery.[4]
[5]
[6]
[7]
[8]
[9]
[10] However, ML applications in acute pain management have been limited by population,
surgery type, and the electronic health record (EHR) predictors used for model development.
The present study explored the potential for ML models to predict postoperative acute
pain and chronic opioid use. Our modeling strategy integrated preoperative EHR and
wearable device data of patients who underwent different types of surgeries.
Longitudinal EHR data provide access to various data types surrounding surgical procedures,
including patient demographics, medical histories, and medication records. These factors
are potentially predictive of an individual's response to pain and opioid requirements
following a surgical procedure.[4]
[5]
[6]
[7]
[8]
[9] Additionally, wearable technology, such as activity trackers or physiological sensors,
enables real-time collection of data types such as vital signs, physical activity
levels, and sleep patterns. Wearable devices may offer objective measurements of nuanced
indicators of pain and opioid requirements.[10] ML models can leverage this amalgamation of preoperative EHR and wearable device
data to uncover complex patterns and relationships not apparent through traditional
statistical methods.
The objective of this study was to explore the state-of-the-art application of ML
techniques to predict acute postoperative pain and chronic opioid use. We (1) developed
a range of ML models; (2) compared model performance with others[4]
[5]
[6]
[7]
[8]
[9]
[10]; and (3) used advanced visualization technique (Shapley additive explanations [SHAP][11]) to identify the most pivotal predictors of model outcomes and individual patient
outcomes.
Methods
This retrospective cohort study used the All of Us Research Program Dataset v7 (Registered & Controlled Tier), which included EHR data,
physical measurements, survey responses, wearables data, and genomic data from biosamples
of the enrolled participants, who were 18 years of age or older and living in the
United States from May 6, 2018, to July 1, 2022. Patients from the All of Us dataset were included in this study if they: (1) underwent one of eight surgeries
based on categories from the American College of Surgeons National Surgical Quality
Improvement Program[12]: general, gynecology, orthopedics, plastic, neurology, vascular, urology, and thoracic
surgery; (2) shared both EHR and wearable device data; and (3) were ≥80% consistent[13] in wearing the Fitbit device. [Fig. 1A] is a flowchart of patient inclusion and exclusion. We identified a cohort of approximately
347 patients who met all the study criteria. The timeline for the study is shown in
[Fig. 1B]. The timeframe of input variable collection was 6 weeks before surgery. All longitudinal
input variables were averaged for this period. The first outcome, self-reported acute
pain, was considered in the time frame of 6 weeks after surgery ([Fig. 1B]: first pipeline). The second outcome, chronic opioid use, was considered within
a period of 6 to 12 months after the surgery ([Fig. 1B]: second pipeline).
Fig. 1 (A) Inclusion and exclusion criteria flow chart. (B) Timeline of the study for two outcomes (represented as orange box): first pipeline
for postoperative acute pain with no data used after 6 weeks of surgery (323 days,
represented as gray box); second pipeline for chronic opioid use with no data used
between index date and 6 months post-surgery (182 days, represented as gray box).
The blue box shows input variables and its timeline.
Data Source and Preprocessing
Outcomes: the first outcome was acute postoperative pain, derived from patient self-reported
outcomes available in All of Us v7. Patients rated their pain from 0 (no pain) to 10 (severe pain) within 6 weeks
following surgery. Based on nascent orthopedic literature (e.g., lumbar spine surgery[14] and joint replacement surgery[15]), pain at 6 weeks after surgery is a crucial measure of acute postoperative pain.
Evidence shows that pain at this time point is associated with persistent pain 6 to
12 months after surgery. Per the literature, increased pain is associated with increased
heart rate (HR) after surgery.[16]
[17]
[18] The present analysis found a positive correlation between HR data from Fitbit devices
and patients' nonmissing self-reported pain responses. The missingness at random in
the pain survey responses was imputed using the Multivariate Imputation by Chained
Equations (MICE) algorithm.[19] In MICE, a dataset's missing value is imputed with an iterative cycle of prediction
models. Each given variable in the dataset is imputed using other variables at each
iteration.[19] This iterative imputation technique considers the relationships between variables
(in our case, pain responses) based on corresponding patient HR values. We averaged
longitudinal HR data for the week before each patient's pain survey response date.
The MICE algorithm was implemented using the SciKit-Learn Iterative Imputer.[20] Since the relation between pain and HR was linear, we used linear regression as
an estimator. Completeness ensured that each patient's pain trajectory was represented
consistently across the timeframe. After collecting pain scores for 6 weeks, we averaged
them for each patient. We normalized the scores by dividing them into three categories:
mild pain (0–3), moderate pain (4–6), and severe pain (7–10).[21] We conducted exploratory data analyses to see the distribution and patterns of pain
scores over time through histograms and density plots. Our visualizations provided
evidence that averaging could effectively capture the central tendency of the pain
experiences while minimizing the impact of short-term fluctuations.
The second outcome of the study was chronic opioid use. The All of Us dataset includes EHR drug data. In this study cohort, we included patients who used
six opioids (e.g., codeine, hydromorphone, methadone, morphine, oxycodone, or buprenorphine)
in various preparations (i.e., oral, parenteral). We defined chronic opioid use as
having at least one opioid prescription filled between 6 and 12 months after surgery.[22]
[23]
[24] These prescriptions in our dataset were not continuous due to missing data in the
longitudinal drug EHR records. To overcome this issue, we used the last opioid prescription
dose during the 6 to 12-month period to calculate morphine milligram equivalents (MME)
using standard conversion factors.[25] We categorized chronic opioid use as low dose if the prescription was less than
50 MME or high dose if the prescription was greater than or equal to 50 MME, per Centers
for Disease Control and Prevention (CDC) guidelines.[26] The used definition of chronic opioid use may not actually represent persistent
postoperative opioid use, but nonetheless aligns with commonly used definitions for
identifying opioid use in observational data.[22]
[23]
[24]
Variables: we included predictors from EHR and wearable devices that might affect acute postoperative
pain and chronic opioid use.[4]
[5]
[6]
[7]
[8]
[9]
[10] The variables were grouped into five domains: (1) demographics (e.g., age, gender,
race, and ethnicity) and socioeconomic status (e.g., income, education); (2) medical
history (e.g., body mass index [BMI], history of cancer, diabetes, myocardial infarction,
and anxiety or depression, average opioid use 6 weeks before surgery); (3) laboratory
values (e.g., average albumin level in the blood [g/dL] 6 weeks before surgery); (4)
Fitbit data for physical activity (e.g., mean lightly active minutes, mean very active
minutes, mean sedentary minutes) and sleep (e.g., mean time spent asleep, mean time
spent in deep sleep, mean time spent awake during main sleep); and (5) surgical procedures.
We averaged longitudinal variables, including laboratory values (e.g., albumin levels
in the blood 6 weeks before surgery), Fitbit data for physical activity (e.g., lightly
active minutes, very active minutes, sedentary minutes), and sleep (e.g., time spent
asleep, time spent in deep sleep, time spent awake during main sleep). The cohort's
population characteristics are summarized in [Table 1].
Table 1
Population characteristics and univariate analysis for the two outcomes
|
N
|
%
|
Acute postoperative pain
|
Chronic opioid use
|
|
Categorical variables
|
N ≤[a] 347
|
|
Mild (N ≤ 295)
|
Moderate (N ≤ 32)
|
Severe (N ≤ 20)
|
p-Value
|
Low (N ≤ 305)
|
High (N ≤ 42)
|
p-Value
|
|
Gender
|
Female
|
≤259
|
≤75%
|
223
|
23
|
≤20
|
0.53
|
224
|
35
|
0.23
|
|
Male
|
≤88
|
≤25%
|
72
|
≤20
|
≤20
|
81
|
≤20
|
|
Race
|
White
|
≤297
|
≤86%
|
254
|
28
|
≤20
|
0.37
|
260
|
37
|
0.79
|
|
Non White
|
≤50
|
≤14%
|
41
|
≤20
|
≤20
|
45
|
≤20
|
|
Ethnicity
|
Not Hispanic or Latino
|
≤327
|
≤94%
|
275
|
32
|
≤20
|
0.15
|
289
|
38
|
0.44
|
|
Hispanic or Latino
|
≤20
|
≤6%
|
≤20
|
0
|
0
|
≤20
|
≤20
|
|
Income level
|
Unknown
|
≤189
|
≤54%
|
137
|
32
|
≤20
|
0.52
|
160
|
29
|
0.19
|
|
Less than 10–50k
|
≤33
|
≤10%
|
33
|
0
|
0
|
30
|
≤20
|
|
50–150k
|
≤90
|
≤26%
|
90
|
0
|
0
|
84
|
≤20
|
|
150k to more than 200k
|
≤35
|
≤10%
|
35
|
0
|
0
|
31
|
≤20
|
|
Education level
|
Unknown
|
≤189
|
≤54%
|
137
|
32
|
≤20
|
0.89
|
160
|
29
|
0.12
|
|
College one to three or college graduate/advanced degree
|
≤137
|
≤39%
|
137
|
0
|
0
|
≤126
|
≤20
|
|
Less than or equivalent to highest grade: Twelve or GED
|
≤21
|
≤6%
|
21
|
0
|
0
|
≤20
|
≤20
|
|
History of depression/anxiety
|
Yes
|
≤117
|
≤34%
|
100
|
≤20
|
≤20
|
<0.05
|
110
|
≤20
|
<0.05
|
|
No
|
≤203
|
≤59%
|
180
|
≤20
|
≤20
|
175
|
28
|
|
Unknown
|
≤27
|
≤8%
|
15
|
≤20
|
≤20
|
≤20
|
≤20
|
|
Have or had cancer
|
Yes
|
≤74
|
≤21%
|
69
|
≤20
|
≤20
|
<0.05
|
66
|
≤20
|
0.9
|
|
No
|
≤173
|
≤50%
|
156
|
≤20
|
≤20
|
152
|
21
|
|
Unknown
|
≤100
|
≤29%
|
70
|
≤20
|
≤20
|
87
|
≤20
|
|
Have or had diabetes
|
Yes
|
≤81
|
≤23%
|
74
|
≤20
|
≤20
|
<0.05
|
73
|
≤20
|
<0.05
|
|
No
|
≤213
|
≤61%
|
195
|
≤20
|
≤20
|
192
|
21
|
|
Unknown
|
≤53
|
≤15%
|
26
|
≤20
|
≤20
|
40
|
≤20
|
|
History of myocardial infraction
|
Yes
|
≤20
|
≤5%
|
15
|
≤20
|
≤20
|
0.21
|
≤20
|
≤20
|
<0.05
|
|
No
|
≤319
|
≤92%
|
274
|
28
|
≤20
|
287
|
32
|
|
Unknown
|
≤20
|
≤3%
|
≤20
|
≤20
|
≤20
|
≤20
|
≤20
|
|
Opioid use prior to the surgery
|
Yes
|
≤56
|
≤16%
|
50
|
≤20
|
≤20
|
0.6
|
46
|
≤20
|
<0.05
|
|
No
|
≤291
|
≤84%
|
245
|
28
|
≤20
|
259
|
32
|
|
Surgery type
|
General
|
≤103
|
≤30%
|
87
|
≤20
|
≤20
|
<0.05
|
86
|
≤20
|
<0.05
|
|
Orthopedics
|
≤70
|
≤20%
|
60
|
≤20
|
≤20
|
65
|
≤20
|
|
Gynecology
|
≤54
|
≤16%
|
50
|
≤20
|
0
|
46
|
≤20
|
|
Plastic
|
≤46
|
≤13%
|
35
|
≤20
|
≤20
|
41
|
≤20
|
|
Neurology
|
≤39
|
≤11%
|
34
|
≤20
|
≤20
|
37
|
≤20
|
|
Urology
|
≤21
|
≤6%
|
≤20
|
≤20
|
≤20
|
≤20
|
≤20
|
|
Thoracic
|
≤20
|
≤3%
|
≤20
|
0
|
0
|
≤20
|
≤20
|
|
Vascular
|
≤20
|
≤1%
|
≤20
|
≤20
|
≤20
|
≤20
|
≤20
|
|
Continuous variables
|
Mean [SD]
|
Range [min–max]
|
Mild (mean [SD])
|
Moderate (mean [SD])
|
Severe (mean [SD])
|
p-Value
|
Low (mean [SD])
|
High (mean [SD])
|
p-Value
|
|
Age (y)
|
60 [12.81]
|
26–86
|
58 [78.56]
|
60 [75.98]
|
63 [90.79]
|
0.57
|
56 [12.84]
|
57 [13.14]
|
0.17
|
|
Mean light active minutes in a day
|
189.38 [78.97]
|
50.75–406.89
|
190.77 [78.56]
|
193.63 [90.79]
|
177.84 [75.98]
|
<0.05
|
205.91 [81.39]
|
184.66 [74.69]
|
0.38
|
|
Mean sedentary minutes in a day
|
909.97 [240.68]
|
291.86–1,352.89
|
904.32 [239.58]
|
925.20 [251.20]
|
952.67 [253.29]
|
0.53
|
899.63 [236.04]
|
917.69 [243.82]
|
<0.05
|
|
Mean very active minutes in a day
|
15.41 [21.39]
|
5.67–177.81
|
10.84 [10.50]
|
12.15 [16.94]
|
16.07 [22.32]
|
0.38
|
14.45 [14.73]
|
17.49 [17.96]
|
<0.05
|
|
Mean time spent asleep in minutes
|
347.83 [96.08]
|
79.40–825.75
|
369.91 [91.78]
|
345.24 [104.32]
|
350.81 [138.15]
|
<0.05
|
399.18 [93.55]
|
380.21 [88.89]
|
0.74
|
|
Mean time spent in deep sleep in minutes
|
57.32 [12.41]
|
5.83–91.00
|
59.03 [12.50]
|
57.06 [11.39]
|
56.38 [12.97]
|
0.64
|
59.99 [10.13]
|
55.81 [6.65]
|
0.59
|
|
Mean time spent awake in minutes
|
40.03 [22.41]
|
10.00–144.57
|
39.93 [21.89]
|
38.57 [24.80]
|
43.71 [26.66]
|
<0.05
|
36.95 [27.67]
|
40.45 [21.60]
|
0.34
|
|
Albumin level
|
4.44 [5.69]
|
2.23–6.89
|
4.49 [5.53]
|
4.17 [0.44]
|
4.08 [0.34]
|
0.89
|
4.00 [0.46]
|
4.50 [6.08]
|
0.68
|
|
BMI ratio
|
31.41 [8.23]
|
17.98–64.22
|
31.91 [8.31]
|
28.33 [7.14]
|
32.18 [7.82]
|
0.35
|
31.08 [7.83]
|
34.84 [11.20]
|
<0.05
|
Abbreviations: BMI, body mass index; SD, standard deviation.
a ≤ Representation adheres to reporting recommendation of All of Us research portal. We comply with the All of Us Data and Statistics Dissemination Policy.
Statistical Analysis
We performed univariate analysis to determine factors associated with the outcomes.
For categorical variables, we used the chi-square test for both outcomes. For continuous
variables, we employed a one-way analysis of variance for acute pain outcome (more
than two categories) and a two-sample t-test for chronic opioid use outcome (two categories). A p-value of <0.05 was considered significant. We applied the statistical approach separately
to each input variable to obtain its significance in predicting acute pain and chronic
opioid use.
Model Development and Validation
The dataset of approximately 347 patients was randomly split into a training dataset
(70% of the samples) and a testing dataset (30%) for unbiased validation. The training
dataset was used for model development and hyperparameter tuning. We developed four
models to predict the outcomes: logistic regression (LR), random forest (RF), extreme
gradient boosting trees (XGB), and an ensemble learning algorithm called “stacking.”[27] For the stacking model, we used RF and XGBoost as base models, with LR as the meta-model/final
estimator. Previous studies found these methods yielded the best prediction results.[4]
[5]
[6]
[7]
[8]
[9]
[10]
[Supplementary Appendix A] in the Supplementary Material describes the details for each approach.
We developed all four models independently using 70% of training data to predict postoperative
acute pain as mild, medium, or severe and chronic opioid use as high or low dosage.
As most values for demographics and socioeconomic covariates were unknown or missing,
we did not include them in model development. We used 21 variables from EHR and wearable
devices to predict both outcomes. Class distribution was highly imbalanced for both
outcomes. Therefore, we used the Synthetic Minority Oversampling Technique (SMOTE)[28] to create synthetic databases on feature space similarities between samples from
the minority class.[29]
[30]
We performed hyperparameter tuning using the training set. We used the randomized
grid search cross-validation technique from Scikit-Learn. This algorithm randomly
chooses a value for each hyperparameter from the grid and evaluates the model using
that random set. After finding optimal hyperparameters, we evaluated model performance
using performance metrics from the testing dataset: accuracy, recall, f-1 score, precision,
area under the curve (AUC) score, and receiver operating characteristic (ROC) curve.
To obtain reliable estimates of performance metrics and their variability, we incorporated
fivefold cross-validation in the test set. This involved repeatedly splitting the
test set into different subsets and evaluating the model performance for each subset.
We calculated 95% confidence intervals (CIs) based on the distribution of values from
test-set cross-validation. The 2.5th and 97.5th percentiles of the distribution were
used to establish the lower and upper bounds of the 95% CIs, respectively. Model development,
hyperparameter tuning, and evaluation were performed in Python on the All of Us research program's cloud analysis environment.
Feature Importance and Real-World Application
We used SHAP analysis to understand model prediction in terms of feature importance.
SHAP is a visualization approach. It provides insight into the decision-making processes
of complicated “black box” models by training distinct interpretable models to describe
the local behavior of the model and how it arrives at a prediction. SHAP locally explains
the variable impact on individual predictions. SHAP values can offer heuristic assessments
of variables significant to overall model performance.[11] We used the model with the highest accuracy, precision, recall, and AUC scores for
SHAP analysis to extract the top 15 predictors of the model outcomes: acute pain and
chronic opioid use. The SHAP force plot showed the model at the individual level.
Analysis of individual patients can be a tool for clinicians to take actions driving
the prediction toward lower acute pain and chronic opioid use.
[Table S1] in [Supplementary Appendix C] shows the TRIPOD checklist of the overall structure by the page numbers from manuscript.
Results
Patient Characteristics
Our study population's characteristics are described in [Table 1]. This group appeared physically active based on Fitbit data collected during the
6 weeks before surgery, using definitions established by the CDC and World Health
Organization (WHO).[31]
[32] The daily average for “lightly active minutes” in the cohort was 189 minutes, which
the WHO considers a “healthy lifestyle.”[31] However, the cohort's average sedentary time was higher (909 minutes) than recommended
for a healthy lifestyle (540 minutes).[31]
[32] According to sleep data obtained from Fitbit, patients' average sleep time was 5.7 hours,
below the 7 hours of sleep recommended by the CDC.[33] The average blood albumin level (4.44 g/dL) was in the normal range per CDC guidelines.[34] However, the maximum value of albumin level (6.89 g/dL) exceeded the normal range.
The cohort had an average BMI ratio of 31.41, which is considered obese.[35] For both outcomes, the proportion of individuals represented in different demographic
and socioeconomic groups was similar to the entire cohort's distribution. The average
age of patients with severe pain in the 6 weeks following surgery (63 years) was higher
than the other categories: mild (58 years) and moderate (60 years). On average, patients
with severe pain spent more time performing “very active activities” (16 minutes)
and less time in “lightly active activities” (177 minutes) than those with mild or
moderate pain. The average sleep time for patients with mild pain (6 hours) was higher
than for patients with moderate (5.7 hours) or severe (5.8 hours) pain. Patients with
mild pain, on average, spent more time in deep sleep (1 hour) and less time awake
(0.6 hours) during main sleep compared with patients with moderate or severe pain
([Table 1]).
Patients with low-dose chronic opioid use spent more time in “lightly active activities”
(205 minutes) than patients with high-dose opioid use (184 minutes). High-dose chronic
opioid users spent more time performing “very active activities” (17.49 minutes) than
low-dose chronic opioid users (14.45 minutes). On average, time asleep (6.6 hours)
and in deep sleep (1 hour) was higher for patients with low-dose chronic opioid use.
[Table 1] summarizes the patient characteristics, acute postoperative pain, and chronic opioid
use outcomes.
Univariate Analysis
The univariate analysis results are summarized in [Table 1].
Acute postoperative pain: among categorical variables, the primary factors associated with severe pain from
a univariate analysis of the entire population were history of anxiety or depression
(p = 0.03), history of cancer (p = 0.001), history of diabetes (p = 0.009), and type of surgery (p = 0.03). Among continuous variables, average time spent in lightly active activities
(p = 0.006), average time spent asleep (p = 0.03), and average time spent awake (p = 0.007) were associated with severe acute pain.
Chronic opioid use: among categorical variables, the significant factors associated with high chronic
opioid dose use were a history of anxiety or depression (p = 0.009), a history of diabetes (p = 0.003), a history of myocardial infarction (p = 0.02), opioid use before surgery (p = 0.001), and surgery type (p = 0.004). Among continuous variables, average time spent in very active activities
(p = 0.01), average time spent in sedentary activities (p-value = 0.02), and BMI ratio (p-value = 0.03) were associated factors.
Algorithm Performance
Acute postoperative pain:
[Table 2] summarizes the performance of all four models across evaluation metrics for the
pain outcome. The stacking ensemble algorithm had the best accuracy (0.68). Additionally,
the stacking algorithm had better precision (0.68), recall (0.68), and F1 score (0.68)
in comparison to the LR, RF, and XGB algorithms. Using the best-performing model,
stacking, the one versus the rest multiclass ROC curve gives the highest AUC score
for Class 2 (severe pain) versus the rest (0.88; 95% CI: 0.85–0.91), compared with
Class 0 (mild pain) versus the rest (0.86; 95% CI: 0.84–0.89) and Class 1 (moderate
pain) versus the rest (0.86; 95% CI: 0.84–0.91). [Fig. 2] shows the stacking of the multiclass ROC curve. The precision–recall curve for the
pain outcome can be seen in [Supplementary Appendix B] ([Fig. S1]).
Fig. 2 ROC curve for stacking algorithm for one versus rest multiclass for the acute postoperative
pain outcome. Class 0: mild pain, class 1: moderate pain, class 2: severe pain. ROC,
receiver operating characteristic.
Table 2
Model performance for the acute postoperative pain outcome
|
Models
|
Evaluation metrics
|
|
Accuracy
|
Precision
|
Recall
|
F1 score
|
|
Logistic regression (LR)
|
0.52
|
0.51
|
0.52
|
0.52
|
|
Random forest (RF)
|
0.65
|
0.65
|
0.65
|
0.65
|
|
XG boost (XGB)
|
0.61
|
0.62
|
0.61
|
0.61
|
|
Stacking
|
0.68
|
0.68
|
0.68
|
0.68
|
Chronic opioid use: model performance for the chronic opioid use outcome across different evaluation
metrics is summarized in [Table 3]. The accuracy of the stacking (0.89) and XGB (0.89) models was better than RF (0.74)
or LR (0.68). The stacking algorithm outperformed the other three models on other
evaluation metrics in precision (0.91), recall (0.89), and F1 score (0.89). The ROC
curve for all the models is shown in [Fig. 3]. The stacking algorithm (AUC score = 0.90; 95% CI: 0.88–0.92) outperformed the LR
(AUC score = 0.74; 95% CI: 0.70–0.77), RF (AUC score = 0.81; 95% CI: 0.79–0.83), and
XGB (AUC score = 0.87; 95% CI: 0.85–0.88) models in predicting chronic opioid use.
The precision–recall curve can be seen in [Supplementary Appendix B] ([Fig. S2]).
Fig. 3 ROC curve for all four models for the prediction of chronic opioid use. ROC, receiver
operating characteristic.
Table 3
Model performance for chronic opioid use outcome
|
Models
|
Evaluation metrics
|
|
Accuracy
|
Precision
|
Recall
|
F1 score
|
AUC score [95% CI]
|
|
Logistic regression (LR)
|
0.68
|
0.68
|
0.68
|
0.67
|
0.74 [0.70–0.77]
|
|
Random forest (RF)
|
0.74
|
0.76
|
0.74
|
0.73
|
0.81 [0.79–0.83]
|
|
XG boost (XGB)
|
0.89
|
0.90
|
0.89
|
0.88
|
0.87 [0.85–0.88]
|
|
Stacking
|
0.89
|
0.91
|
0.89
|
0.89
|
0.90 [0.88–0.92]
|
Abbreviations: AUC, area under the curve; CI, confidence interval.
SHAP Analysis for Feature Importance and Real-World Application
Acute postoperative pain:
[Fig. 4] shows the results of the SHAP analysis with the best-performing model (i.e., stacking).
The top three features in predicting acute pain were average time spent asleep, time
spent in lightly active activities, and time spent awake. [Fig. 5] gives a hypothetical example of how the model could be used in the real world. After
entering the patient ID, prediction results show that the patient might have severe
pain ([Fig. 5B]).
Fig. 4 Feature importance (n = 15) SHAP plot for the acute postoperative pain outcome. Class 0: mild pain, class
1: moderate pain, class 2: severe pain. SHAP, Shapley additive explanation.
Fig. 5 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 21. The predictors
with red arrows drive the outcome toward severe pain, and the blue arrows drive it
toward not having severe pain. SHAP, Shapley additive explanation.
Chronic opioid use:
[Fig. 6] summarizes the SHAP plot of the top 15 features impacting model performance using
the stacking algorithm. Average time spent in very active activities, average time
spent in lightly active activities, and history of cancer were the three most important
predictors of chronic opioid use. The SHAP analysis showed that patients with higher
average time spent in very active activities were more likely to be on high chronic
opioid doses. In contrast, patients who spent more time in lightly active activities
were likely to be on low chronic opioids ([Fig. 6]). Patients who spent more time in deep sleep and asleep were less likely to be on
chronic opioid use. Patients with higher opioid use before surgery were likely to
use higher opioid doses 6 to 12 months after surgery ([Fig. 6]). [Fig. 7] shows an example application of the model for an individual patient using the SHAP
force plot.
Fig. 6 Feature importance (n = 15) SHAP plot for predicting chronic opioid use. SHAP, Shapley additive explanation.
Fig. 7 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 35. The predictors with
red arrows drive the outcome toward high-dose chronic opioid use, and the blue arrows
drive it toward not having high-dose chronic opioid use. SHAP, Shapley additive explanation.
Discussion
This article presents a comprehensive framework integrating preoperative data from
the EHR and wearable devices to develop predictive models for acute postoperative
pain and chronic opioid use. The model developed in the present study performs better
across evaluation parameters than the existing work.[4]
[5]
[6]
[7]
[8]
[9]
[10] We also present a way to understand better the predictions made by these “black
box” models using the advanced visualization tool SHAP. We could understand which
features contributed to pain-related outcomes by harnessing the power of advanced
analytics. Knowing associated factors can guide pain management practices, enable
clinicians to tailor treatment plans, help mitigate high-dose chronic opioid use,
and ultimately improve patient care.
The prediction of severe pain (Class 2) with the highest performance (AUC = 0.88;
95% CI: 0.85–0.91; [Fig. 2]) suggests that the stacking prediction model was better at predicting patients at
risk for severe pain post-surgery. The predictors from wearable devices played a prominent
role in predicting the pain outcome. Previous studies showed that patients with surgical
anxiety and pain had reduced total sleep time, decreased rapid eye movement sleep,
and decreased slow-wave sleep both pre- and post-surgery.[36]
[37]
[38]
[39] Our findings add to the evidence for a direct relationship between poor sleep (duration
and disruption) before surgery and pain during postoperative recovery. We found from
the univariate and SHAP analysis that the average time spent asleep and average time
spent awake were crucial factors in predicting severe pain post-surgery ([Table 1] and [Fig. 4]). The evidence emphasizes sleep's vital role in recovery during the postoperative
period. Access to sleep data clinically may allow health care providers to suggest
patient-specific interventions supporting good sleep habits. Such interventions may
positively impact the psychological and physiological recovery of surgical patients.
For the chronic opioid use outcome, compared with other modeling techniques and existing
studies,[5]
[6] the ensemble learning algorithm gave the highest accuracy and performed well across
all other evaluation metrics ([Table 3] and [Fig. 3]). These findings suggest that ensemble learning is the most promising algorithm
to explore for predicting postoperative pain-related outcomes. Upon visualizing model
predictors using SHAP, we found that the physical activity data from Fitbit, time
spent in very active and lightly active activities, and history of cancer were the
essential features. The literature shows that access to a preoperatively planned exercise
regimen may reduce prolonged opioid use.[40]
[41]
[42] Our findings complemented the literature, showing that patients more involved in
very active activities (e.g., bicycling) before the surgery were more likely to be
on high-dose opioid use 6 to 12 months after surgery, while patients doing more lightly
active activities (e.g., walking) before surgery were less likely to be on chronic
opioid use 6 to 12 months after surgery ([Fig. 6]). The type of surgery was also associated with higher chronic opioids ([Table 1]). Our findings from the SHAP analysis are corroborated by previous work[43]
[44] indicating that patients with minimally invasive gynecology surgeries were likely
to be on lower doses of opioid use 6 to 12 months after surgery ([Fig. 6]). It is worth noting that other than the staking model, the next-best performing
model for the pain outcome was RF, and XGBoost was next-best for the chronic opioid
use outcome. We acknowledge that the computational cost of stacking might outweigh
the marginal improvements observed from those models.
The real-world application of these models could allow health care providers to proactively
identify patients at increased risk of experiencing severe postoperative pain and
to address modifiable risk factors (e.g., sleep disturbances, inactivity) with targeted
preoperative rehabilitation interventions and personalized pain management plans.
[Figs. 5] and [7] are examples of SHAP analysis for individual patients. Clinicians can monitor and
regulate these predictors to reduce the chances of a severe pain outcome for this
patient. With proper risk presentation in a clinical context, predictions might help
health care providers to optimize opioid dosing as part of individualized multimodal
pain management in the acute postoperative period. Doing so may help with balancing
the risks of overprescribing or undertreating pain. These findings contribute to improving
patient outcomes and help address the broader societal concern of opioid-related adverse
effects and addiction.
Our study included a diversity of surgeries, ranging from major to minor procedures.
Each of these could lead to different levels of postoperative pain. This diversity
is crucial and reflects real-world scenarios where patients undergo various surgical
interventions (supported by previous works[45]
[46]). Our models provide insights into predicting postoperative pain and chronic opioid
use across this spectrum. We also acknowledge that pain is a complex construct influenced
by biological, psychological, and social factors. Our intention in suggesting the
monitoring and regulation of predictors is not to oversimplify the biopsychosocial
nature of pain but to provide a practical approach for health care providers to identify
high-risk patients and tailor interventions. It is crucial to note that our models
should be integrated into a comprehensive patient-centered care plan that accounts
for the multifaceted nature of pain. The goal is not to simply regulate individual
predictors but to use the information to enhance personalized pain management plans.
The major shortcomings of our work were the small sample size and limited data for
demographic and socioeconomic covariates. The high proportion of unknown values among
our cohort drove the decision to exclude socioeconomic variables. In addition, our
study uses EHR drug prescription data as a surrogate for drug use behavior, which
may not always reflect actual behavior. Further, while All of Us prescription data include form and dosage details, the provider's recommended frequency
of use is missing. It is well recognized that All of Us may not constitute a perfectly representative sample.[47]
[48] In our case, the inclusion of individuals with Fitbit data and patient who underwent
surgery might not describe all people with acute pain or opioid use. There remains
significant value, however, in leveraging All of Us in this work as it is one of the very few resources with both EHR and Fitbit data
types available for individuals undergoing a range of surgical procedures. Our model's
accuracy and other performance metrics were good, yet more research is needed to confirm
the results in larger and more complete datasets from various clinical care settings.
Conclusion
Integrating ML models in forecasting postsurgical acute pain and chronic opioid use
could optimize pain management practices and reduce the burden of opioid dependence.
By providing accurate predictions, clinicians can tailor personalized pain management
strategies, leading to improved patient satisfaction, reduced health care costs, and
mitigated risks associated with long-term opioid use. However, it is crucial to address
the challenges and ethical implications to ensure these models' responsible and effective
deployment in clinical settings.
Clinical Relevance Statement
Clinical Relevance Statement
By leveraging longitudinal EHRs and wearable data, this research could offer personalized
predictive models, enabling health care providers to anticipate postoperative pain
levels and opioid requirements with greater accuracy. The use of SHAP not only enhances
our understanding of patient-specific risk factors but also assists in tailoring treatment
plans, minimizing the potential for high-dose chronic opioid use and addiction. Further,
by identifying modifiable predictors, SHAP analysis empowers clinicians to implement
targeted interventions, such as alternative pain management strategies or early addiction
prevention measures, ultimately improving patient outcomes. This study represents
a critical step toward more effective and patient-centered pain management practices,
aligning with the evolving landscape of precision medicine in health care.
Multiple-Choice Questions
Multiple-Choice Questions
-
While creating the dataset, what percent of consistency in wearing the wearable device
was considered?
Correct Answer: The correct answer is option c. This value is considered based on the citation 13.
-
Based on the findings, which type of surgical patients were likely to be on lower
doses of opioid use 6–12 months after surgery?
-
Orthopedics
-
Gynecology
-
Plastic
-
Urology
Correct Answer: The correct answer is option b. This is one the findings, also discussed in discussion
sections- Our findings from the SHAP analysis are corroborated by previous work39,
40 indicating that patients with minimally invasive gynecology surgeries, were likely
to be on lower dose of opioid use 6-12 months after surgery.