Appl Clin Inform 2024; 15(03): 569-582
DOI: 10.1055/a-2321-0397
Research Article

Predicting Postoperative Pain and Opioid Use with Machine Learning Applied to Longitudinal Electronic Health Record and Wearable Data

Authors

  • Nidhi Soley

    1   Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland, United States
    2   Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States
  • Traci J. Speed

    3   Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, School of Medicine, Baltimore, Maryland, United States
  • Anping Xie

    4   Armstrong Institute for Patient Safety and Quality, Johns Hopkins University, School of Medicine, Baltimore, Maryland, United States
    5   Department of Anesthesia and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States
  • Casey Overby Taylor

    1   Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland, United States
    2   Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States
    6   Department of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States

Funding This work was also supported in part by an NIH NHGRI Genomic Innovator award (award no.: R35 HG010714 to C.O.T.).
 

Abstract

Background Managing acute postoperative pain and minimizing chronic opioid use are crucial for patient recovery and long-term well-being.

Objectives This study explored using preoperative electronic health record (EHR) and wearable device data for machine-learning models that predict postoperative acute pain and chronic opioid use.

Methods The study cohort consisted of approximately 347 All of Us Research Program participants who underwent one of eight surgical procedures and shared EHR and wearable device data. We developed four machine learning models and used the Shapley additive explanations (SHAP) technique to identify the most relevant predictors of acute pain and chronic opioid use.

Results The stacking ensemble model achieved the highest accuracy in predicting acute pain (0.68) and chronic opioid use (0.89). The area under the curve score for severe pain versus other pain was highest (0.88) when predicting acute postoperative pain. Values of logistic regression, random forest, extreme gradient boosting, and stacking ensemble ranged from 0.74 to 0.90 when predicting postoperative chronic opioid use. Variables from wearable devices played a prominent role in predicting both outcomes.

Conclusion SHAP detection of individual risk factors for severe pain can help health care providers tailor pain management plans. Accurate prediction of postoperative chronic opioid use before surgery can help mitigate the risk for the outcomes we studied. Prediction can also reduce the chances of opioid overuse and dependence. Such mitigation can promote safer and more effective pain control for patients during their recovery.


Background and Significance

Postoperative pain is among the most common types of acute pain.[1] Surgical procedures require effective postoperative pain management as a basic requirement. Ineffective pain management may lead to higher mortality,[2] [3] a longer recovery time, and higher hospital costs.[3] Postoperative pain management is vital to patient care, affecting immediate recovery and long-term outcomes. Per the 2019 National Health Statistics Reports, approximately 4.5 percent of patients in the United States use opioids for acute and chronic pain management.[3] Opioids have historically been the mainstay of acute and chronic pain management. Yet, the widespread use of opioids has contributed to an opioid epidemic, highlighting the urgent need for personalized pain management techniques.

Predicting postoperative pain and opioid use can help with tailoring pain management plans, providing early interventions to high-risk patients, individualized patient education, resource allocation, and opioid stewardship. However, it is challenging to estimate pain severity after surgery since conventional pain treatment depends on subjective evaluations. Recent studies show that the use of machine learning (ML) models might predict pain and opioid requirements following surgery.[4] [5] [6] [7] [8] [9] [10] However, ML applications in acute pain management have been limited by population, surgery type, and the electronic health record (EHR) predictors used for model development. The present study explored the potential for ML models to predict postoperative acute pain and chronic opioid use. Our modeling strategy integrated preoperative EHR and wearable device data of patients who underwent different types of surgeries.

Longitudinal EHR data provide access to various data types surrounding surgical procedures, including patient demographics, medical histories, and medication records. These factors are potentially predictive of an individual's response to pain and opioid requirements following a surgical procedure.[4] [5] [6] [7] [8] [9] Additionally, wearable technology, such as activity trackers or physiological sensors, enables real-time collection of data types such as vital signs, physical activity levels, and sleep patterns. Wearable devices may offer objective measurements of nuanced indicators of pain and opioid requirements.[10] ML models can leverage this amalgamation of preoperative EHR and wearable device data to uncover complex patterns and relationships not apparent through traditional statistical methods.

The objective of this study was to explore the state-of-the-art application of ML techniques to predict acute postoperative pain and chronic opioid use. We (1) developed a range of ML models; (2) compared model performance with others[4] [5] [6] [7] [8] [9] [10]; and (3) used advanced visualization technique (Shapley additive explanations [SHAP][11]) to identify the most pivotal predictors of model outcomes and individual patient outcomes.


Methods

This retrospective cohort study used the All of Us Research Program Dataset v7 (Registered & Controlled Tier), which included EHR data, physical measurements, survey responses, wearables data, and genomic data from biosamples of the enrolled participants, who were 18 years of age or older and living in the United States from May 6, 2018, to July 1, 2022. Patients from the All of Us dataset were included in this study if they: (1) underwent one of eight surgeries based on categories from the American College of Surgeons National Surgical Quality Improvement Program[12]: general, gynecology, orthopedics, plastic, neurology, vascular, urology, and thoracic surgery; (2) shared both EHR and wearable device data; and (3) were ≥80% consistent[13] in wearing the Fitbit device. [Fig. 1A] is a flowchart of patient inclusion and exclusion. We identified a cohort of approximately 347 patients who met all the study criteria. The timeline for the study is shown in [Fig. 1B]. The timeframe of input variable collection was 6 weeks before surgery. All longitudinal input variables were averaged for this period. The first outcome, self-reported acute pain, was considered in the time frame of 6 weeks after surgery ([Fig. 1B]: first pipeline). The second outcome, chronic opioid use, was considered within a period of 6 to 12 months after the surgery ([Fig. 1B]: second pipeline).

Zoom
Fig. 1 (A) Inclusion and exclusion criteria flow chart. (B) Timeline of the study for two outcomes (represented as orange box): first pipeline for postoperative acute pain with no data used after 6 weeks of surgery (323 days, represented as gray box); second pipeline for chronic opioid use with no data used between index date and 6 months post-surgery (182 days, represented as gray box). The blue box shows input variables and its timeline.

Data Source and Preprocessing

Outcomes: the first outcome was acute postoperative pain, derived from patient self-reported outcomes available in All of Us v7. Patients rated their pain from 0 (no pain) to 10 (severe pain) within 6 weeks following surgery. Based on nascent orthopedic literature (e.g., lumbar spine surgery[14] and joint replacement surgery[15]), pain at 6 weeks after surgery is a crucial measure of acute postoperative pain. Evidence shows that pain at this time point is associated with persistent pain 6 to 12 months after surgery. Per the literature, increased pain is associated with increased heart rate (HR) after surgery.[16] [17] [18] The present analysis found a positive correlation between HR data from Fitbit devices and patients' nonmissing self-reported pain responses. The missingness at random in the pain survey responses was imputed using the Multivariate Imputation by Chained Equations (MICE) algorithm.[19] In MICE, a dataset's missing value is imputed with an iterative cycle of prediction models. Each given variable in the dataset is imputed using other variables at each iteration.[19] This iterative imputation technique considers the relationships between variables (in our case, pain responses) based on corresponding patient HR values. We averaged longitudinal HR data for the week before each patient's pain survey response date. The MICE algorithm was implemented using the SciKit-Learn Iterative Imputer.[20] Since the relation between pain and HR was linear, we used linear regression as an estimator. Completeness ensured that each patient's pain trajectory was represented consistently across the timeframe. After collecting pain scores for 6 weeks, we averaged them for each patient. We normalized the scores by dividing them into three categories: mild pain (0–3), moderate pain (4–6), and severe pain (7–10).[21] We conducted exploratory data analyses to see the distribution and patterns of pain scores over time through histograms and density plots. Our visualizations provided evidence that averaging could effectively capture the central tendency of the pain experiences while minimizing the impact of short-term fluctuations.

The second outcome of the study was chronic opioid use. The All of Us dataset includes EHR drug data. In this study cohort, we included patients who used six opioids (e.g., codeine, hydromorphone, methadone, morphine, oxycodone, or buprenorphine) in various preparations (i.e., oral, parenteral). We defined chronic opioid use as having at least one opioid prescription filled between 6 and 12 months after surgery.[22] [23] [24] These prescriptions in our dataset were not continuous due to missing data in the longitudinal drug EHR records. To overcome this issue, we used the last opioid prescription dose during the 6 to 12-month period to calculate morphine milligram equivalents (MME) using standard conversion factors.[25] We categorized chronic opioid use as low dose if the prescription was less than 50 MME or high dose if the prescription was greater than or equal to 50 MME, per Centers for Disease Control and Prevention (CDC) guidelines.[26] The used definition of chronic opioid use may not actually represent persistent postoperative opioid use, but nonetheless aligns with commonly used definitions for identifying opioid use in observational data.[22] [23] [24]

Variables: we included predictors from EHR and wearable devices that might affect acute postoperative pain and chronic opioid use.[4] [5] [6] [7] [8] [9] [10] The variables were grouped into five domains: (1) demographics (e.g., age, gender, race, and ethnicity) and socioeconomic status (e.g., income, education); (2) medical history (e.g., body mass index [BMI], history of cancer, diabetes, myocardial infarction, and anxiety or depression, average opioid use 6 weeks before surgery); (3) laboratory values (e.g., average albumin level in the blood [g/dL] 6 weeks before surgery); (4) Fitbit data for physical activity (e.g., mean lightly active minutes, mean very active minutes, mean sedentary minutes) and sleep (e.g., mean time spent asleep, mean time spent in deep sleep, mean time spent awake during main sleep); and (5) surgical procedures. We averaged longitudinal variables, including laboratory values (e.g., albumin levels in the blood 6 weeks before surgery), Fitbit data for physical activity (e.g., lightly active minutes, very active minutes, sedentary minutes), and sleep (e.g., time spent asleep, time spent in deep sleep, time spent awake during main sleep). The cohort's population characteristics are summarized in [Table 1].

Table 1

Population characteristics and univariate analysis for the two outcomes

N

%

Acute postoperative pain

Chronic opioid use

Categorical variables

N ≤[a] 347

Mild (N ≤ 295)

Moderate (N ≤ 32)

Severe (N ≤ 20)

p-Value

Low (N ≤ 305)

High (N ≤ 42)

p-Value

Gender

Female

≤259

≤75%

223

23

≤20

0.53

224

35

0.23

Male

≤88

≤25%

72

≤20

≤20

81

≤20

Race

White

≤297

≤86%

254

28

≤20

0.37

260

37

0.79

Non White

≤50

≤14%

41

≤20

≤20

45

≤20

Ethnicity

Not Hispanic or Latino

≤327

≤94%

275

32

≤20

0.15

289

38

0.44

Hispanic or Latino

≤20

≤6%

≤20

0

0

≤20

≤20

Income level

Unknown

≤189

≤54%

137

32

≤20

0.52

160

29

0.19

Less than 10–50k

≤33

≤10%

33

0

0

30

≤20

50–150k

≤90

≤26%

90

0

0

84

≤20

150k to more than 200k

≤35

≤10%

35

0

0

31

≤20

Education level

Unknown

≤189

≤54%

137

32

≤20

0.89

160

29

0.12

College one to three or college graduate/advanced degree

≤137

≤39%

137

0

0

≤126

≤20

Less than or equivalent to highest grade: Twelve or GED

≤21

≤6%

21

0

0

≤20

≤20

History of depression/anxiety

Yes

≤117

≤34%

100

≤20

≤20

<0.05

110

≤20

<0.05

No

≤203

≤59%

180

≤20

≤20

175

28

Unknown

≤27

≤8%

15

≤20

≤20

≤20

≤20

Have or had cancer

Yes

≤74

≤21%

69

≤20

≤20

<0.05

66

≤20

0.9

No

≤173

≤50%

156

≤20

≤20

152

21

Unknown

≤100

≤29%

70

≤20

≤20

87

≤20

Have or had diabetes

Yes

≤81

≤23%

74

≤20

≤20

<0.05

73

≤20

<0.05

No

≤213

≤61%

195

≤20

≤20

192

21

Unknown

≤53

≤15%

26

≤20

≤20

40

≤20

History of myocardial infraction

Yes

≤20

≤5%

15

≤20

≤20

0.21

≤20

≤20

<0.05

No

≤319

≤92%

274

28

≤20

287

32

Unknown

≤20

≤3%

≤20

≤20

≤20

≤20

≤20

Opioid use prior to the surgery

Yes

≤56

≤16%

50

≤20

≤20

0.6

46

≤20

<0.05

No

≤291

≤84%

245

28

≤20

259

32

Surgery type

General

≤103

≤30%

87

≤20

≤20

<0.05

86

≤20

<0.05

Orthopedics

≤70

≤20%

60

≤20

≤20

65

≤20

Gynecology

≤54

≤16%

50

≤20

0

46

≤20

Plastic

≤46

≤13%

35

≤20

≤20

41

≤20

Neurology

≤39

≤11%

34

≤20

≤20

37

≤20

Urology

≤21

≤6%

≤20

≤20

≤20

≤20

≤20

Thoracic

≤20

≤3%

≤20

0

0

≤20

≤20

Vascular

≤20

≤1%

≤20

≤20

≤20

≤20

≤20

Continuous variables

Mean [SD]

Range [min–max]

Mild (mean [SD])

Moderate (mean [SD])

Severe (mean [SD])

p-Value

Low (mean [SD])

High (mean [SD])

p-Value

Age (y)

60 [12.81]

26–86

58 [78.56]

60 [75.98]

63 [90.79]

0.57

56 [12.84]

57 [13.14]

0.17

Mean light active minutes in a day

189.38 [78.97]

50.75–406.89

190.77 [78.56]

193.63 [90.79]

177.84 [75.98]

<0.05

205.91 [81.39]

184.66 [74.69]

0.38

Mean sedentary minutes in a day

909.97 [240.68]

291.86–1,352.89

904.32 [239.58]

925.20 [251.20]

952.67 [253.29]

0.53

899.63 [236.04]

917.69 [243.82]

<0.05

Mean very active minutes in a day

15.41 [21.39]

5.67–177.81

10.84 [10.50]

12.15 [16.94]

16.07 [22.32]

0.38

14.45 [14.73]

17.49 [17.96]

<0.05

Mean time spent asleep in minutes

347.83 [96.08]

79.40–825.75

369.91 [91.78]

345.24 [104.32]

350.81 [138.15]

<0.05

399.18 [93.55]

380.21 [88.89]

0.74

Mean time spent in deep sleep in minutes

57.32 [12.41]

5.83–91.00

59.03 [12.50]

57.06 [11.39]

56.38 [12.97]

0.64

59.99 [10.13]

55.81 [6.65]

0.59

Mean time spent awake in minutes

40.03 [22.41]

10.00–144.57

39.93 [21.89]

38.57 [24.80]

43.71 [26.66]

<0.05

36.95 [27.67]

40.45 [21.60]

0.34

Albumin level

4.44 [5.69]

2.23–6.89

4.49 [5.53]

4.17 [0.44]

4.08 [0.34]

0.89

4.00 [0.46]

4.50 [6.08]

0.68

BMI ratio

31.41 [8.23]

17.98–64.22

31.91 [8.31]

28.33 [7.14]

32.18 [7.82]

0.35

31.08 [7.83]

34.84 [11.20]

<0.05

Abbreviations: BMI, body mass index; SD, standard deviation.


a ≤ Representation adheres to reporting recommendation of All of Us research portal. We comply with the All of Us Data and Statistics Dissemination Policy.



Statistical Analysis

We performed univariate analysis to determine factors associated with the outcomes. For categorical variables, we used the chi-square test for both outcomes. For continuous variables, we employed a one-way analysis of variance for acute pain outcome (more than two categories) and a two-sample t-test for chronic opioid use outcome (two categories). A p-value of <0.05 was considered significant. We applied the statistical approach separately to each input variable to obtain its significance in predicting acute pain and chronic opioid use.


Model Development and Validation

The dataset of approximately 347 patients was randomly split into a training dataset (70% of the samples) and a testing dataset (30%) for unbiased validation. The training dataset was used for model development and hyperparameter tuning. We developed four models to predict the outcomes: logistic regression (LR), random forest (RF), extreme gradient boosting trees (XGB), and an ensemble learning algorithm called “stacking.”[27] For the stacking model, we used RF and XGBoost as base models, with LR as the meta-model/final estimator. Previous studies found these methods yielded the best prediction results.[4] [5] [6] [7] [8] [9] [10] [Supplementary Appendix A] in the Supplementary Material describes the details for each approach.

We developed all four models independently using 70% of training data to predict postoperative acute pain as mild, medium, or severe and chronic opioid use as high or low dosage. As most values for demographics and socioeconomic covariates were unknown or missing, we did not include them in model development. We used 21 variables from EHR and wearable devices to predict both outcomes. Class distribution was highly imbalanced for both outcomes. Therefore, we used the Synthetic Minority Oversampling Technique (SMOTE)[28] to create synthetic databases on feature space similarities between samples from the minority class.[29] [30]

We performed hyperparameter tuning using the training set. We used the randomized grid search cross-validation technique from Scikit-Learn. This algorithm randomly chooses a value for each hyperparameter from the grid and evaluates the model using that random set. After finding optimal hyperparameters, we evaluated model performance using performance metrics from the testing dataset: accuracy, recall, f-1 score, precision, area under the curve (AUC) score, and receiver operating characteristic (ROC) curve. To obtain reliable estimates of performance metrics and their variability, we incorporated fivefold cross-validation in the test set. This involved repeatedly splitting the test set into different subsets and evaluating the model performance for each subset. We calculated 95% confidence intervals (CIs) based on the distribution of values from test-set cross-validation. The 2.5th and 97.5th percentiles of the distribution were used to establish the lower and upper bounds of the 95% CIs, respectively. Model development, hyperparameter tuning, and evaluation were performed in Python on the All of Us research program's cloud analysis environment.


Feature Importance and Real-World Application

We used SHAP analysis to understand model prediction in terms of feature importance. SHAP is a visualization approach. It provides insight into the decision-making processes of complicated “black box” models by training distinct interpretable models to describe the local behavior of the model and how it arrives at a prediction. SHAP locally explains the variable impact on individual predictions. SHAP values can offer heuristic assessments of variables significant to overall model performance.[11] We used the model with the highest accuracy, precision, recall, and AUC scores for SHAP analysis to extract the top 15 predictors of the model outcomes: acute pain and chronic opioid use. The SHAP force plot showed the model at the individual level. Analysis of individual patients can be a tool for clinicians to take actions driving the prediction toward lower acute pain and chronic opioid use.

[Table S1] in [Supplementary Appendix C] shows the TRIPOD checklist of the overall structure by the page numbers from manuscript.



Results

Patient Characteristics

Our study population's characteristics are described in [Table 1]. This group appeared physically active based on Fitbit data collected during the 6 weeks before surgery, using definitions established by the CDC and World Health Organization (WHO).[31] [32] The daily average for “lightly active minutes” in the cohort was 189 minutes, which the WHO considers a “healthy lifestyle.”[31] However, the cohort's average sedentary time was higher (909 minutes) than recommended for a healthy lifestyle (540 minutes).[31] [32] According to sleep data obtained from Fitbit, patients' average sleep time was 5.7 hours, below the 7 hours of sleep recommended by the CDC.[33] The average blood albumin level (4.44 g/dL) was in the normal range per CDC guidelines.[34] However, the maximum value of albumin level (6.89 g/dL) exceeded the normal range. The cohort had an average BMI ratio of 31.41, which is considered obese.[35] For both outcomes, the proportion of individuals represented in different demographic and socioeconomic groups was similar to the entire cohort's distribution. The average age of patients with severe pain in the 6 weeks following surgery (63 years) was higher than the other categories: mild (58 years) and moderate (60 years). On average, patients with severe pain spent more time performing “very active activities” (16 minutes) and less time in “lightly active activities” (177 minutes) than those with mild or moderate pain. The average sleep time for patients with mild pain (6 hours) was higher than for patients with moderate (5.7 hours) or severe (5.8 hours) pain. Patients with mild pain, on average, spent more time in deep sleep (1 hour) and less time awake (0.6 hours) during main sleep compared with patients with moderate or severe pain ([Table 1]).

Patients with low-dose chronic opioid use spent more time in “lightly active activities” (205 minutes) than patients with high-dose opioid use (184 minutes). High-dose chronic opioid users spent more time performing “very active activities” (17.49 minutes) than low-dose chronic opioid users (14.45 minutes). On average, time asleep (6.6 hours) and in deep sleep (1 hour) was higher for patients with low-dose chronic opioid use. [Table 1] summarizes the patient characteristics, acute postoperative pain, and chronic opioid use outcomes.


Univariate Analysis

The univariate analysis results are summarized in [Table 1].

Acute postoperative pain: among categorical variables, the primary factors associated with severe pain from a univariate analysis of the entire population were history of anxiety or depression (p = 0.03), history of cancer (p = 0.001), history of diabetes (p = 0.009), and type of surgery (p = 0.03). Among continuous variables, average time spent in lightly active activities (p = 0.006), average time spent asleep (p = 0.03), and average time spent awake (p = 0.007) were associated with severe acute pain.

Chronic opioid use: among categorical variables, the significant factors associated with high chronic opioid dose use were a history of anxiety or depression (p = 0.009), a history of diabetes (p = 0.003), a history of myocardial infarction (p = 0.02), opioid use before surgery (p = 0.001), and surgery type (p = 0.004). Among continuous variables, average time spent in very active activities (p = 0.01), average time spent in sedentary activities (p-value = 0.02), and BMI ratio (p-value = 0.03) were associated factors.


Algorithm Performance

Acute postoperative pain: [Table 2] summarizes the performance of all four models across evaluation metrics for the pain outcome. The stacking ensemble algorithm had the best accuracy (0.68). Additionally, the stacking algorithm had better precision (0.68), recall (0.68), and F1 score (0.68) in comparison to the LR, RF, and XGB algorithms. Using the best-performing model, stacking, the one versus the rest multiclass ROC curve gives the highest AUC score for Class 2 (severe pain) versus the rest (0.88; 95% CI: 0.85–0.91), compared with Class 0 (mild pain) versus the rest (0.86; 95% CI: 0.84–0.89) and Class 1 (moderate pain) versus the rest (0.86; 95% CI: 0.84–0.91). [Fig. 2] shows the stacking of the multiclass ROC curve. The precision–recall curve for the pain outcome can be seen in [Supplementary Appendix B] ([Fig. S1]).

Zoom
Fig. 2 ROC curve for stacking algorithm for one versus rest multiclass for the acute postoperative pain outcome. Class 0: mild pain, class 1: moderate pain, class 2: severe pain. ROC, receiver operating characteristic.
Table 2

Model performance for the acute postoperative pain outcome

Models

Evaluation metrics

Accuracy

Precision

Recall

F1 score

Logistic regression (LR)

0.52

0.51

0.52

0.52

Random forest (RF)

0.65

0.65

0.65

0.65

XG boost (XGB)

0.61

0.62

0.61

0.61

Stacking

0.68

0.68

0.68

0.68

Chronic opioid use: model performance for the chronic opioid use outcome across different evaluation metrics is summarized in [Table 3]. The accuracy of the stacking (0.89) and XGB (0.89) models was better than RF (0.74) or LR (0.68). The stacking algorithm outperformed the other three models on other evaluation metrics in precision (0.91), recall (0.89), and F1 score (0.89). The ROC curve for all the models is shown in [Fig. 3]. The stacking algorithm (AUC score = 0.90; 95% CI: 0.88–0.92) outperformed the LR (AUC score = 0.74; 95% CI: 0.70–0.77), RF (AUC score = 0.81; 95% CI: 0.79–0.83), and XGB (AUC score = 0.87; 95% CI: 0.85–0.88) models in predicting chronic opioid use. The precision–recall curve can be seen in [Supplementary Appendix B] ([Fig. S2]).

Zoom
Fig. 3 ROC curve for all four models for the prediction of chronic opioid use. ROC, receiver operating characteristic.
Table 3

Model performance for chronic opioid use outcome

Models

Evaluation metrics

Accuracy

Precision

Recall

F1 score

AUC score [95% CI]

Logistic regression (LR)

0.68

0.68

0.68

0.67

0.74 [0.70–0.77]

Random forest (RF)

0.74

0.76

0.74

0.73

0.81 [0.79–0.83]

XG boost (XGB)

0.89

0.90

0.89

0.88

0.87 [0.85–0.88]

Stacking

0.89

0.91

0.89

0.89

0.90 [0.88–0.92]

Abbreviations: AUC, area under the curve; CI, confidence interval.



SHAP Analysis for Feature Importance and Real-World Application

Acute postoperative pain: [Fig. 4] shows the results of the SHAP analysis with the best-performing model (i.e., stacking). The top three features in predicting acute pain were average time spent asleep, time spent in lightly active activities, and time spent awake. [Fig. 5] gives a hypothetical example of how the model could be used in the real world. After entering the patient ID, prediction results show that the patient might have severe pain ([Fig. 5B]).

Zoom
Fig. 4 Feature importance (n = 15) SHAP plot for the acute postoperative pain outcome. Class 0: mild pain, class 1: moderate pain, class 2: severe pain. SHAP, Shapley additive explanation.
Zoom
Fig. 5 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 21. The predictors with red arrows drive the outcome toward severe pain, and the blue arrows drive it toward not having severe pain. SHAP, Shapley additive explanation.

Chronic opioid use: [Fig. 6] summarizes the SHAP plot of the top 15 features impacting model performance using the stacking algorithm. Average time spent in very active activities, average time spent in lightly active activities, and history of cancer were the three most important predictors of chronic opioid use. The SHAP analysis showed that patients with higher average time spent in very active activities were more likely to be on high chronic opioid doses. In contrast, patients who spent more time in lightly active activities were likely to be on low chronic opioids ([Fig. 6]). Patients who spent more time in deep sleep and asleep were less likely to be on chronic opioid use. Patients with higher opioid use before surgery were likely to use higher opioid doses 6 to 12 months after surgery ([Fig. 6]). [Fig. 7] shows an example application of the model for an individual patient using the SHAP force plot.

Zoom
Fig. 6 Feature importance (n = 15) SHAP plot for predicting chronic opioid use. SHAP, Shapley additive explanation.
Zoom
Fig. 7 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 35. The predictors with red arrows drive the outcome toward high-dose chronic opioid use, and the blue arrows drive it toward not having high-dose chronic opioid use. SHAP, Shapley additive explanation.


Discussion

This article presents a comprehensive framework integrating preoperative data from the EHR and wearable devices to develop predictive models for acute postoperative pain and chronic opioid use. The model developed in the present study performs better across evaluation parameters than the existing work.[4] [5] [6] [7] [8] [9] [10] We also present a way to understand better the predictions made by these “black box” models using the advanced visualization tool SHAP. We could understand which features contributed to pain-related outcomes by harnessing the power of advanced analytics. Knowing associated factors can guide pain management practices, enable clinicians to tailor treatment plans, help mitigate high-dose chronic opioid use, and ultimately improve patient care.

The prediction of severe pain (Class 2) with the highest performance (AUC = 0.88; 95% CI: 0.85–0.91; [Fig. 2]) suggests that the stacking prediction model was better at predicting patients at risk for severe pain post-surgery. The predictors from wearable devices played a prominent role in predicting the pain outcome. Previous studies showed that patients with surgical anxiety and pain had reduced total sleep time, decreased rapid eye movement sleep, and decreased slow-wave sleep both pre- and post-surgery.[36] [37] [38] [39] Our findings add to the evidence for a direct relationship between poor sleep (duration and disruption) before surgery and pain during postoperative recovery. We found from the univariate and SHAP analysis that the average time spent asleep and average time spent awake were crucial factors in predicting severe pain post-surgery ([Table 1] and [Fig. 4]). The evidence emphasizes sleep's vital role in recovery during the postoperative period. Access to sleep data clinically may allow health care providers to suggest patient-specific interventions supporting good sleep habits. Such interventions may positively impact the psychological and physiological recovery of surgical patients.

For the chronic opioid use outcome, compared with other modeling techniques and existing studies,[5] [6] the ensemble learning algorithm gave the highest accuracy and performed well across all other evaluation metrics ([Table 3] and [Fig. 3]). These findings suggest that ensemble learning is the most promising algorithm to explore for predicting postoperative pain-related outcomes. Upon visualizing model predictors using SHAP, we found that the physical activity data from Fitbit, time spent in very active and lightly active activities, and history of cancer were the essential features. The literature shows that access to a preoperatively planned exercise regimen may reduce prolonged opioid use.[40] [41] [42] Our findings complemented the literature, showing that patients more involved in very active activities (e.g., bicycling) before the surgery were more likely to be on high-dose opioid use 6 to 12 months after surgery, while patients doing more lightly active activities (e.g., walking) before surgery were less likely to be on chronic opioid use 6 to 12 months after surgery ([Fig. 6]). The type of surgery was also associated with higher chronic opioids ([Table 1]). Our findings from the SHAP analysis are corroborated by previous work[43] [44] indicating that patients with minimally invasive gynecology surgeries were likely to be on lower doses of opioid use 6 to 12 months after surgery ([Fig. 6]). It is worth noting that other than the staking model, the next-best performing model for the pain outcome was RF, and XGBoost was next-best for the chronic opioid use outcome. We acknowledge that the computational cost of stacking might outweigh the marginal improvements observed from those models.

The real-world application of these models could allow health care providers to proactively identify patients at increased risk of experiencing severe postoperative pain and to address modifiable risk factors (e.g., sleep disturbances, inactivity) with targeted preoperative rehabilitation interventions and personalized pain management plans. [Figs. 5] and [7] are examples of SHAP analysis for individual patients. Clinicians can monitor and regulate these predictors to reduce the chances of a severe pain outcome for this patient. With proper risk presentation in a clinical context, predictions might help health care providers to optimize opioid dosing as part of individualized multimodal pain management in the acute postoperative period. Doing so may help with balancing the risks of overprescribing or undertreating pain. These findings contribute to improving patient outcomes and help address the broader societal concern of opioid-related adverse effects and addiction.

Our study included a diversity of surgeries, ranging from major to minor procedures. Each of these could lead to different levels of postoperative pain. This diversity is crucial and reflects real-world scenarios where patients undergo various surgical interventions (supported by previous works[45] [46]). Our models provide insights into predicting postoperative pain and chronic opioid use across this spectrum. We also acknowledge that pain is a complex construct influenced by biological, psychological, and social factors. Our intention in suggesting the monitoring and regulation of predictors is not to oversimplify the biopsychosocial nature of pain but to provide a practical approach for health care providers to identify high-risk patients and tailor interventions. It is crucial to note that our models should be integrated into a comprehensive patient-centered care plan that accounts for the multifaceted nature of pain. The goal is not to simply regulate individual predictors but to use the information to enhance personalized pain management plans.

The major shortcomings of our work were the small sample size and limited data for demographic and socioeconomic covariates. The high proportion of unknown values among our cohort drove the decision to exclude socioeconomic variables. In addition, our study uses EHR drug prescription data as a surrogate for drug use behavior, which may not always reflect actual behavior. Further, while All of Us prescription data include form and dosage details, the provider's recommended frequency of use is missing. It is well recognized that All of Us may not constitute a perfectly representative sample.[47] [48] In our case, the inclusion of individuals with Fitbit data and patient who underwent surgery might not describe all people with acute pain or opioid use. There remains significant value, however, in leveraging All of Us in this work as it is one of the very few resources with both EHR and Fitbit data types available for individuals undergoing a range of surgical procedures. Our model's accuracy and other performance metrics were good, yet more research is needed to confirm the results in larger and more complete datasets from various clinical care settings.


Conclusion

Integrating ML models in forecasting postsurgical acute pain and chronic opioid use could optimize pain management practices and reduce the burden of opioid dependence. By providing accurate predictions, clinicians can tailor personalized pain management strategies, leading to improved patient satisfaction, reduced health care costs, and mitigated risks associated with long-term opioid use. However, it is crucial to address the challenges and ethical implications to ensure these models' responsible and effective deployment in clinical settings.


Clinical Relevance Statement

By leveraging longitudinal EHRs and wearable data, this research could offer personalized predictive models, enabling health care providers to anticipate postoperative pain levels and opioid requirements with greater accuracy. The use of SHAP not only enhances our understanding of patient-specific risk factors but also assists in tailoring treatment plans, minimizing the potential for high-dose chronic opioid use and addiction. Further, by identifying modifiable predictors, SHAP analysis empowers clinicians to implement targeted interventions, such as alternative pain management strategies or early addiction prevention measures, ultimately improving patient outcomes. This study represents a critical step toward more effective and patient-centered pain management practices, aligning with the evolving landscape of precision medicine in health care.


Multiple-Choice Questions

  1. While creating the dataset, what percent of consistency in wearing the wearable device was considered?

    • >80%

    • ≥70%

    • ≥80%

    • >70%

    Correct Answer: The correct answer is option c. This value is considered based on the citation 13.

  2. Based on the findings, which type of surgical patients were likely to be on lower doses of opioid use 6–12 months after surgery?

    • Orthopedics

    • Gynecology

    • Plastic

    • Urology

    Correct Answer: The correct answer is option b. This is one the findings, also discussed in discussion sections- Our findings from the SHAP analysis are corroborated by previous work39, 40 indicating that patients with minimally invasive gynecology surgeries, were likely to be on lower dose of opioid use 6-12 months after surgery.



Conflict of Interest

None declared.

Acknowledgments

The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.

Protection of Human and Animal Subjects

The protocol was reviewed and approved by the Institutional Review Board at the Johns Hopkins University School of Medicine (identifier: IRB00422898).


Supplementary Material


Address for correspondence

Nidhi Soley, MS
Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University
3101 Wyman Park Drive, Hackerman 318, Baltimore, MD 21218
United States   

Publication History

Received: 02 December 2023

Accepted: 06 May 2024

Accepted Manuscript online:
07 May 2024

Article published online:
17 July 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom
Fig. 1 (A) Inclusion and exclusion criteria flow chart. (B) Timeline of the study for two outcomes (represented as orange box): first pipeline for postoperative acute pain with no data used after 6 weeks of surgery (323 days, represented as gray box); second pipeline for chronic opioid use with no data used between index date and 6 months post-surgery (182 days, represented as gray box). The blue box shows input variables and its timeline.
Zoom
Fig. 2 ROC curve for stacking algorithm for one versus rest multiclass for the acute postoperative pain outcome. Class 0: mild pain, class 1: moderate pain, class 2: severe pain. ROC, receiver operating characteristic.
Zoom
Fig. 3 ROC curve for all four models for the prediction of chronic opioid use. ROC, receiver operating characteristic.
Zoom
Fig. 4 Feature importance (n = 15) SHAP plot for the acute postoperative pain outcome. Class 0: mild pain, class 1: moderate pain, class 2: severe pain. SHAP, Shapley additive explanation.
Zoom
Fig. 5 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 21. The predictors with red arrows drive the outcome toward severe pain, and the blue arrows drive it toward not having severe pain. SHAP, Shapley additive explanation.
Zoom
Fig. 6 Feature importance (n = 15) SHAP plot for predicting chronic opioid use. SHAP, Shapley additive explanation.
Zoom
Fig. 7 Real-world application of the stacking model using SHAP force plot. (a) Prompt for entering patient ID. (b) Prediction results and force plot for patient with ID number 35. The predictors with red arrows drive the outcome toward high-dose chronic opioid use, and the blue arrows drive it toward not having high-dose chronic opioid use. SHAP, Shapley additive explanation.