Appl Clin Inform 2024; 15(03): 489-500
DOI: 10.1055/s-0044-1787185
Research Article

Comparing Clinician Estimates versus a Statistical Tool for Predicting Risk of Death within 45 Days of Admission for Cancer Patients

Authors

  • Adrianna Z. Herskovits*

    1   Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Tiffanny Newman*

    2   Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Kevin Nicholas*

    2   Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Cesar F. Colorado-Jimenez

    1   Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Claire E. Perry

    2   Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Alisa Valentino

    1   Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Isaac Wagner

    2   Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Barbara Egan

    3   Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Dmitriy Gorenshteyn

    4   Thirty Madison, New York, New York, United States
  • Andrew J. Vickers

    5   Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States
  • Melissa S. Pessin

    6   Department of Pathology, University of Chicago, Chicago, Illinois, United States

Funding Supported in part by the National Institutes of Health/National Cancer Institute Cancer Center Support grant P30 CA008748.
 

Abstract

Objectives While clinical practice guidelines recommend that oncologists discuss goals of care with patients who have advanced cancer, it is estimated that less than 20% of individuals admitted to the hospital with high-risk cancers have end-of-life discussions with their providers. While there has been interest in developing models for mortality prediction to trigger such discussions, few studies have compared how such models compare with clinical judgment to determine a patient's mortality risk.

Methods This study is a prospective analysis of 1,069 solid tumor medical oncology hospital admissions (n = 911 unique patients) from February 7 to June 7, 2022, at Memorial Sloan Kettering Cancer Center. Electronic surveys were sent to hospitalists, advanced practice providers, and medical oncologists the first afternoon following a hospital admission and they were asked to estimate the probability that the patient would die within 45 days. Provider estimates of mortality were compared with those from a predictive model developed using a supervised machine learning methodology, and incorporated routine laboratory, demographic, biometric, and admission data. Area under the receiver operating characteristic curve (AUC), calibration and decision curves were compared between clinician estimates and the model predictions.

Results Within 45 days following hospital admission, 229 (25%) of 911 patients died. The model performed better than the clinician estimates (AUC 0.834 vs. 0.753, p < 0.0001). Integrating clinician predictions with the model's estimates further increased the AUC to 0.853 (p < 0.0001). Clinicians overestimated risk whereas the model was extremely well-calibrated. The model demonstrated net benefit over a wide range of threshold probabilities.

Conclusion The inpatient prognosis at admission model is a robust tool to assist clinical providers in evaluating mortality risk, and it has recently been implemented in the electronic medical record at our institution to improve end-of-life care planning for hospitalized cancer patients.


Background and Significance

Discussions between oncology patients and clinicians regarding end-of-life care play an important role in guiding informed decision-making related to palliative services, treatment cessation, hospice, and other aspects of advanced care planning.[1] Although the importance of having these discussions is widely recognized, the majority of inpatients with cancer do not have advance directives completed on admission, with fewer than 20% of high-risk cancer patients discussing their wishes in advance with their outpatient oncologists.[1] [2] [3] Having an accurate assessment of a patient's prognosis is one of the challenges faced by providers when initiating these difficult conversations.[4]

Several studies have found that clinical providers are accurate in estimating patient survival less than one-fourth of the time, when accuracy is defined as having a prediction that is within 33% of the amount of time that the patient actually survived.[5] [6] Although the majority of studies show that oncologists overestimate how long a patient will survive, there is substantial variability in the literature and some studies find that clinicians give more pessimistic estimates relative to actual survival outcomes.[5] [7] [8] [9] While the degree of predictive inaccuracy has not been shown to be dependent on clinical experience or the number of years in practice,[7] [10] there is evidence that having a close doctor–patient relationship is associated with a lower prognostic accuracy.[5]

Prediction models offer significant advantages over clinician-based assessments because the prognostic capabilities of individual clinicians vary significantly and because statistical tools are able to integrate a large number of factors and apply them consistently and without subjectivity.[11] Variables that have previously demonstrated strong signal with respect to survival across a range of cancers include patient demographics, tumor characteristics,[12] and laboratory findings.[13] Machine learning (ML) algorithms based on electronic health record data are an emerging methodology that may complement clinical decision-making in predicting cancer mortality[14] [15] as well as other illnesses[16] [17] [18] and prompt end-of-life discussions between patients and their providers.


Objectives

We developed a regression analysis model to predict mortality risk over the subsequent 45 days in solid tumor cancer patients using routine laboratory, demographic, biometric, and admission data at the time of hospital admission. This time interval was selected because patient outcomes are not easily predicted by clinical providers over this intermediate-term time interval, yet end-of-life planning and interventions are still feasible. We compared the model's performance with the judgement of inpatient and outpatient providers identified as being part of the patient's care team using an automated survey-based approach completed within 24 hours of hospital admission. We also assessed whether patient or provider characteristics correlated with clinician risk estimates and performed decision curve analysis to evaluate the clinical utility of using the prediction model in practice.


Methods

Study Design and Outcomes

The purpose of this study was to examine whether clinical judgment or a predictive model that utilized routine laboratory tests, demographic, biometric, and admission data would be best for predicting intermediate-term clinical outcomes of solid tumor oncology patients at the time of hospital admission. In the prospective phase of this study, the cohort consisted of solid tumor adult patients admitted to the Department of Medicine at Memorial Sloan Kettering Cancer Center (MSKCC) between February 7 and June 7, 2022. Surgical, pediatric, hematologic, and neurology patients were excluded because these patients were expected to have different mortality risks and disease trajectories than adult solid tumor medicine patients.

The patient's clinical providers were surveyed by email to assess the provider's ability to predict patient mortality 45 days after admission. The primary endpoint used to compare accuracy of clinician and model predictions was death of the patient within 45 days of the admission. The clinician and model predictions were compared with actual patient outcomes at this time point in the retrospective phase of the study. The study was determined to be exempt by the institutional review board (IRB) of MSKCC under study protocols X21–030 A(3) comparing the clinician's estimate with the statistical model's prediction of patient mortality. A second IRB#18- 491A, LABMED WA-005–22 was submitted to compare the predictions with actual patient outcomes after 45 days.


Clinician Surveys

At MSKCC, a summary of a patient's providers is automatically generated from activity in a patient's chart, such as clicks, orders placed, documents generated, and medications ordered. This tool is called ROSTR (Real-time Online Summary of Team Resources) and in this study, it was used to identify a patient's primary care team members within 24 hours of admission. Consents for participation and surveys were emailed to the providers overseeing a patient's clinical care the first afternoon following admission. We collected information regarding the provider's role in patient care (hospitalist, nurse practitioner or physician's assistant, inpatient medical oncologist, outpatient primary oncologist); the duration of time the clinical provider had known the patient (1 day, less than a week, less than a month, less than a year, greater than a year or don't know them); and the mortality prediction at 45 days following admission (likelihood of dying within 45 days of this admission entered as a percentage using a slider) prospectively in real-time following patient admission. The survey data were compiled using a Research Electronic Data Capture (REDCap) database, a secure platform for designing clinical research surveys.


Model Development

To build a model to predict inpatient mortality within 45 days of admission, we extracted retrospective data from our institutional data warehouse: biometric (height, weight, bovine serum albumin, body mass index [BMI]), demographic (age, sex), admission history (days since and number of Memorial Sloan Kettering admissions in prior 2 months), and the most recent laboratory test results within 24 hours of admission (comprehensive metabolic panel analytes, complete blood count (CBC) with differential analytes, magnesium, phosphorus, prothrombin time, international normalized ratio, activated partial thromboplastin time, and lactate dehydrogenase). We chose the first 24 hours for model prediction because of the expectation to incorporate it into the electronic medical record and help guide clinical decision-making in real-time.

When building the model and making predictions on patients, records with insufficient data (either missing patient identifiers or unavailable CBC or complete metabolic panel results) were excluded. Primarily, we used continuous values for laboratory results. However, for laboratory tests that demonstrated a nonlinear relationship with mortality risk, we also added categorical values (very low, low, normal, high, very high). Individual missing numeric laboratory values were imputed using the median of the admitted solid tumor patient population at MSKCC.

A logistic regression model that uses least absolute shrinkage and selection operator (lasso) was developed. Using a supervised ML methodology, the lasso lambda parameter was iteratively tuned to maximize performance. The study cohort composed a training cohort (n = 14,591 admissions between 2017 and 2018), a validation cohort (n = 3,945 admissions during the first half of 2019), and a test cohort (n = 4,015 admissions during the second half of 2019). We used time-based partitions to help account for changes in cancer treatment and survival rates over time. Eligibility criteria included all adult patients admitted to solid tumor medicine services at MSKCC. Patients admitted to surgical, pediatric, hematologic, and neurology services were excluded based on the expectation that these patients might have different mortality risks and disease trajectories.

Although the logistic model with lasso was selected due to its strong performance and the interpretability of such models, two tree-based ML algorithms XGBoost (extreme gradient boosted decision trees) and random forest (random decision trees) were also evaluated using the same training and validation cohorts and performance was assessed by comparing the areas under the curve (AUCs) at 45 days. Additional validation of the logistic regression model with lasso was the basis of the clinical trial comparing its performance with the assessment performed by clinical providers as described in the Study Design and Outcomes and Results sections of this article. The model was developed using R version 3.6.3 ( http://www.r-project.org/ ).


Statistical Methods

We assumed AUCs close to 0.75, an event rate of 25%, and a correlation between clinician and model predictions of 0.5 and calculated that a sample size of 1,000 patients would give a difference between AUCs close to ± 0.05. We estimated that we would accrue at least 1,000 patients in 4 months and stopped the study once 4 months had passed. The primary endpoint was death within 45 days of admission. All patients whose deaths were known to have occurred within 45 days were included; however, individuals with a final follow-up less than 45 days after admission were excluded because we could not establish accuracy for clinician or model predictions. Patients could have multiple admissions and more than one clinician risk prediction per admission (if multiple care team members replied to their surveys for the same admission). As most patients had only a single admission and evaluation, and to avoid the problem of correlated observations, we evaluated only the patient's first admission in the primary analysis, and when there were multiple clinician predictions for a given admission, one survey was selected randomly. As a sensitivity analysis, we repeated our analyses on the full dataset.

A linear regression model was created to assess whether clinicians are influenced by patient characteristics when estimating the risk of death within 45 days. Clinician-predicted risk was the dependent variable; race, ethnicity, sex, primary language, marital status, age, and religion were the predictors, with admitting service and model-predicted risk as covariates to reflect health status. We then repeated this analysis to evaluate whether clinical provider type or duration of relationship with the patient impacted risk estimates, using the prediction from the first analysis as a covariate.

We then calculated discrimination as AUC, calibration, and net benefit each separately for clinician estimates and model estimates, as well as whether discrimination was improved by combining model and clinician estimates. The 95% confidence interval (CI) for the difference between clinician and model estimates was calculated by bootstrapping, with 5,000 resamples.[19] Net benefit was calculated across the full range of threshold probabilities.[20] [21] Finally, we compared discrimination and calibration between different types of clinicians and estimated the size of the difference in predictions where more than one clinician gave a risk estimate. All statistical analyses were conducted using Stata version 17.0 (Stata Corp., College Station, Texas, United States).



Results

In the 4-month survey period, 1,771 unique patients had 2,287 solid tumor hospital admissions, resulting in 6,363 email surveys to clinicians. The overall response rate was 31% (including a small proportion of clinicians who responded but subsequently declined to participate) and 86% of surveys were completed within 24 hours of receipt. Only survey responses completed within 24 hours of admission were used to capture the clinician's initial assessments of the patient—this best ensured a fair comparison with the model, which was also generating its prediction within the first 24 hours.

The 1,689 surveys completed within 24 hours represented 1,292 admissions and 1,095 unique patients. Admissions were excluded from the final dataset if there were errors with the survey or if patients were lost to follow-up (n = 154 admissions). Hospital admissions were also excluded if the model failed to generate a risk score due to incomplete patient data (n = 69 admissions). The remaining 1,069 admissions (n = 911 patients) were used for further analysis. Within this group, 661 patients had one clinician prediction survey, 219 patients had two provider surveys, and 31 patients had three completed clinical surveys ([Fig. 1]).

Zoom
Fig. 1 Study design. Schematic showing the exclusion criteria for the dataset used in all the analysis.

The demographic data are provided in [Table 1]. The average age for the 911 patients included in the study was 65 years and the group was composed of 43% males and 57% females. The study population was 12% African American, 11% Asian American, and 69% white. Patients of Hispanic ethnicity comprised 8.7% of the cohort. Other covariates included marital status, religion, and language. Statistical analysis of how patient demographics impacted clinician risk predictions are included in [Table 2]. Clinical providers estimated a higher risk of death by 6% for African American and 8% for Asian American patients relative to white patients. Non-English language speakers were given a 7% higher risk of death relative to English-speaking patients. Other factors, including age, sex, Hispanic ethnicity, religion, and marital status, did not have a statistically significant association with clinician's risk prediction. Characteristics of the clinicians completing the surveys did not show any statistically significant difference between type of clinician or length of the relationship between patient and provider ([Table 2]).

Table 1

Patient characteristics for patients at first admission (n = 911)

Patient characteristic

Median (quartiles) or frequency (%)

Age

65 (55–74)

Female

517 (57%)

Race

 African American

109 (12%)

 Asian American

102 (11%)

 Other or unknown

74 (8.1%)

 White

626 (69%)

Ethnicity

 Hispanic

79 (8.7%)

 Non-Hispanic

776 (85%)

 Unknown

56 (6.1%)

Does not speak a foreign language

830 (91%)

Marital status

 Partner

576 (63%)

 Single, divorced, or widowed

327 (36%)

 Unknown

8 (0.9%)

Religion

 Christian

495 (54%)

 Jewish

120 (13%)

 Unknown or none

233 (26%)

 Other

63 (6.9%)

Admitting service

 Breast

76 (8.3%)

 Gastrointestinal

280 (31%)

 General medicine

31 (3.4%)

 Genitourinary

71 (7.8%)

 Gynecologic medical

129 (14%)

 Head and neck, melanoma or sarcoma

138 (15%)

 Lung

70 (7.7%)

 Solid tumor

106 (12%)

 Other

10 (1.1%)

Note: Data are presented as median (quartiles) or frequency (percentage).


Table 2

Multivariable model of clinician risk predictions

Characteristic

Coefficient

95% CI

p-Value

Patient characteristics

 Age per 10 years

0.01

−0.01, 0.02

0.4

 Female

0.01

−0.03, 0.05

0.7

 Race

0.033

  White

Reference

  African American

0.06

0.01, 0.12

  Asian American

0.08

0.01, 0.14

  Other or unknown

0.02

−0.06, 0.10

 Ethnicity

0.7

  Non-Hispanic

Reference

  Hispanic

−0.03

−0.11, 0.04

  Unknown

−0.00

−0.08, 0.08

 Does not speak English

0.07

0.01, 0.014

0.035

 Religion

0.6

  Christian

Reference

  Jewish

0.00

−0.06, 0.06

  Unknown or none

−0.01

−0.05, 0.04

  Other

−0.06

−0.14, 0.02

 Marital status

0.3

  Married

Reference

  Single, divorced, or widowed

−0.02

−0.05, 0.02

  Unknown

0.12

−0.08, 0.31

Clinician characteristic

 Type of clinician

0.2

  Inpatient—APP, n = 201 (22%)

Reference

  Inpatient—Hospitalist, n = 279 (31%)

−0.04

−0.09, 0.01

  Inpatient—Rotating medical oncologist, n = 94 (10%)

−0.06

−0.13, 0.01

  Outpatient—Medical oncologist, n = 274 (30%)

−0.06

−0.15, 0.02

  Other, n = 63 (6.9%)

−0.02

−0.11, 0.08

 Length of relationship with patient

0.4

  1 day. n = 544 (60%)

Reference

  Days, n = 37 (4.1%)

0.06

−0.04, 0.15

  Weeks, n = 40 (4.4%)

0.03

−0.07, 0.13

  Months, n = 107 (12%)

−0.02

−0.12, 0.07

  More than 1 year, n = 183 (20%)

0.03

−0.07, 0.12

Abbreviations: APP, advanced practice provider; CI, confidence interval.


Note: For patient characteristics, the coefficient is the difference in risk given to patients with that characteristic, after multivariable adjustment. For clinicians, the coefficient is the difference given by providers with that characteristic after multivariable adjustment.


The top 20 numeric data features identified by the regression analysis model are shown in [Fig. 2]. The x-axis represents both the direction and value of the coefficients. Negative risk coefficients are “protective,” as in a higher value decreases the risk of mortality, and these features are shown in green. Positive risk coefficients are “riskier,” as in a higher value increases the risk of mortality, and these features are shown in red.

Zoom
Fig. 2 Top 20 features incorporated into the predictive model. All biochemical, biometric, and demographic data were collected at the time of hospital admission and risk coefficients predictive of mortality in the subsequent 45-day period were calculated. The x-axis represents both the direction and value of the coefficients. Negative risk coefficients are protective and shown in green, and coefficients with a higher absolute value indicate a greater degree of protection. Positive risk coefficients are shown in red, with a higher absolute value indicating a greater risk of mortality.

Major influences on the model include abnormalities in blood electrolytes (such as calcium, potassium, sodium, chloride, and the anion gap), biomarkers of renal status (including blood urea nitrogen and creatinine), hematologic parameters (including the mean corpuscular volume, mean corpuscular hemoglobin, red cell distribution width, eosinophils, platelets, neutrophils, and lymphocytes), liver enzymes (including both aspartate aminotransferase and alanine aminotransferase), and indicators of cachexia and poor health (including albumin, alkaline phosphatase, body surface area, and number of nonelective admissions in the prior 60 days). The pattern of biochemical abnormalities identified by the model to predict patient mortality after hospitalization are indicative of broad dysfunction across multiple organ systems. In other words, a patient with compromised renal and liver function, along with abnormal hematologic parameters, who is also becoming cachectic may have small changes in numerous laboratory tests on admission, and the model is able to identify a subtle pattern indicative of poor prognosis whereas the clinician may not.

One of the major findings of this study is that the statistical model outperformed clinical providers with an AUC of 0.834 versus 0.753 for clinicians, and this difference was found to be statistically significant when bootstrapping, with 95% CI of 0.036 to 0.125; p < 0.0001. Combining risk of death at 45 days predicted by the statistical model or clinical judgement yielded a statistically significant (p < 0.0001) improvement of the AUC to 0.853. The AUC boost from a combined clinician–model approach is important because the prognostic model implemented in the electronic medical record is intended to complement the clinical judgement of providers.

As a sensitivity analysis, the AUC calculations were computed using the entire dataset, including instances when more than one clinician gave a risk estimate or when a patient had multiple admissions during the study, and similar results were found (AUCs of 0.829 for the model vs. 0.761 for clinicians). The model was well-calibrated relative to clinician estimates with clinicians generally overestimating risk ([Fig. 3A, C]). That said, the AUC boost mentioned above from a combined clinician–model approach suggests that clinicians are using prognostic information that is not captured by the model.

Zoom
Fig. 3 Calibration plots depicting the performance of (A) clinicians overall (hatched line), (B) machine learning regression model (hatched line), and (C) clinicians who had known patients for at least a week (hatched line). In these plots, the solid line represents ideal calibration with perfect concordance between predicted and actual patient outcomes. Deviation from the solid line represents the degree of miscalibration.

When subgroups of clinical providers were compared, outpatient medical oncologists were able to predict mortality more accurately relative to other clinicians (AUC 0.804 vs. 0.744). However, this observation may be confounded by the length of the relationship with the patient (AUC 0.804 for clinicians with a relationship with the patient for a week or more, compared with 0.728 for clinicians who had known the patient for less than a week). In general, calibration remained extremely poor for clinician estimates, even when restricting to clinicians with a longer relationship to the patient ([Fig. 3C]), as clinicians consistently overestimated risk of death relative to actual patient outcomes in our study.

The decision curve comparing the predictive model and clinical prediction is shown in [Fig. 4]. The predictive model depicted in orange has a positive net benefit across most of the range of threshold probabilities except for a limited set of end-of-life decisions requiring probabilities at or above 85%. In contrast, the clinician predictions illustrated with the green line have net harm for any decision that requires at least a 50% risk. The net benefit for the model is always higher than that for clinician prediction, suggesting that utilization of the statistical model would result in a net increase in the proportion of patients directed toward end-of-life care planning relative to clinician prediction.

Zoom
Fig. 4 Decision curve analysis. Orange line: intervene according to the model. Green line: intervene according to clinician prediction. Blue line: intervene on all patients. Red line: intervene on no patient.

Integrating this tool into the electronic health record so it can provide timely, actionable information to clinical providers has been a primary goal for this project. As shown in [Fig. 5], patients are stratified into low-risk, high-risk, and very high risk categories. The electronic medical record tool shows the patient's qualitative risk category, the percent risk of death in 45 days, and suggested order sets, such as pain management, supportive care, hospice education, and social work consults. In addition, an email is sent to the attending doctor that prompts the provider to use the advanced illness support order set and arrange for a goals-of-care conversation with high-risk and very high risk patients within 48 hours. Providers currently have full discretion over whether to integrate their clinical intuition with results provided by the model and the intent is for this model to be used as a decision support tool intended to inform and not to supplant clinical judgement.

Zoom
Fig. 5 Integration of prognosis of admission tool. (A) Screenshot of prognosis at admission tool in the medical record. (B) Example of email generated for high-risk patients identified by the algorithm recommending advanced illness support order set and goals of care conversations (data presented in Fig. 5 are imaginary).

Discussion

We found that a statistical prediction tool was significantly more accurate than clinician predictions of 45-day mortality in adult patients with advanced solid tumor cancers. Decision curve analysis demonstrated that using the model to guide decision-making would lead to better clinical outcome than using clinician predictions. Although one might expect that cancer staging and performance status would be major drivers of patient prognosis, tumor registry data can be several years old by the time a patient is admitted. This study found that combining subtle changes in multiple laboratory, demographic, and admission data features can be predictive of patient survival without adding the additional components of cancer stage, diagnosis, performance status, and BMI.

The integration of this predictive model into our electronic health record and its acceptance by clinical staff have been a significant undertaking. Prior to implementation, presentations were held with service chiefs and other major stakeholders to educate clinical staff about the appropriate use of this predictive model. In addition, we designed a page on the hospital SharePoint site with a video on how to use this Prognosis at Admission prediction tool, along with descriptions of the top features used to make predictions to increase transparency, understanding, and facilitate use of this clinical tool. We have also created a monitoring dashboard to evaluate whether there is any drift in the model's performance in real-time.

Building a statistical model that uses logistic regression rather than a more complex ML model may have been advantageous in facilitating model interpretability, because the prognostic tool uses laboratory findings and demographics that are already familiar to clinicians. While more complex ML models such as neural nets and tree-based models can have better performance, they can also be more challenging to interpret. Performing this pilot study comparing the performance of the model relative to clinician judgement, building a model based on information routinely used by clinical staff, and having web-based resources available on an ongoing basis have helped facilitate understanding of the model's output.

While most patients with advanced cancers do not discuss their end-of-life care preferences with their providers, a recent study found that 87% of patients support a policy to have admitting physicians initiate a conversation about advanced directives upon hospital admission.[2] One of the barriers to initiating these discussions is that clinicians are not able to accurately predict patient survival.[5] [7] [8] [9] [22] [23] The incorporation of laboratory data to evaluate patient prognosis may improve the reliability of mortality estimates[13] [14] [15] [24] [25] and improve the accuracy of subjective evaluation.

In addition to the humanistic benefit of more accurately predicting intermediate-term mortality, there are economic implications as well. A recent study using a ML tool embedded into the electronic health record system at the University of Pittsburgh found a sustained improvement in palliative care practices and a doubling of the number of goals-of-care conversations between patients and their providers.[26] Patients with advance directives are more likely to use hospice care rather than die in an intensive care unit, which is also associated with better quality of care for both patients and their caregivers[27] and significant health care-related cost savings.[28]

Predictive models have the advantage of being able to integrate numerous biochemical abnormalities across multiple organ systems, as well as other patient variables, in a reproducible and quantitative way. In our study population, the clinicians tended to overestimate risk relative to real-life outcomes, which is evident from the clinician calibration curve falling below the solid diagonal line that represents perfect concordance for higher predicted probabilities ([Fig. 3A]). We also observed that clinical providers use rough estimates that are inherently imprecise, with mortality predictions clustered linearly along the quartiles and deciles ([Supplementary Fig. S1] [available in the online version]). Moreover, we observed a lack of concordance between clinician estimates when comparing surveys completed by two or more providers on the same patients. Only 19% of the paired surveys had a difference in predicted risk of mortality less than 5%, while nearly half (43%) had a risk difference greater than 20% ([Supplementary Fig. S2] [available in the online version]). Incorporating the use of a regression model that prompts clinical staff to initiate end of life and palliative care order sets is advantageous, because it can reduce the subjectivity inherent in relying entirely on physician estimates of patient mortality risk.

One potential weakness of this study is that the response rate of 31% raises the possibility of nonresponse bias. However, unlike the classic examples of nonresponse bias—where the method for obtaining a response makes it easier to obtain a response from some participants than others—there is no obvious mechanism whereby responders would be systematically different from nonresponders. Rather, the low response rate may reflect the inherent difficulty in obtaining responses from clinicians who are working in busy clinical environments. While some studies have investigated clinician response rates in general for surveys with no time cutoff, to our knowledge, no studies have specifically established how many clinicians working in the hospital setting should be expected to respond to an email survey within 24 hours. Moreover, methods used in the literature to increase the response rate such as sending multiple emails, using paper format rather than electronic surveys, following up with telephone calls, and using monetary incentives[29] [30] were not feasible because the added time and costs incurred would be prohibitive, it would be almost impossible to use these to get the clinician response within the required 24-hour timeframe, and it is unclear whether these efforts would have yielded more valid results.[31]

Multiple studies have investigated mortality prediction using statistical methods[12] [14] [15] [24] [25] [26] [32] [33] [34] [35] [36] [37] [38] and our work validates the utility of these predictive tools in an acute cancer care hospital focusing on intermediate-term mortality. Our survey methodology ([Supplementary Fig. S3] [available in the online version]) of predicting probability of death within a set intermediate timeframe is unique as many of the studies that investigate the accuracy of clinical prediction use categorical scoring systems, specific temporal estimates,[9] [10] or the “surprise question” asking providers “Would I be surprised if this patient died in the next year.”[12] [33] [39] [40] Our automated survey-based approach ([Supplementary Fig. S3] [available in the online version]) enabled us to send surveys to many more clinicians than might have otherwise been possible given time and resource constraints. We hypothesize that the simplicity of our survey design—requiring just three multiple-choice responses, and one sliding scale response—minimized the time burden for clinicians and thereby was a factor in achieving our response targets. Our survey approach also yields descriptive information such as role in patient care and length of the patient–provider relationship, while minimizing the potential for recall bias by stipulating survey completion within 24 hours of admission as an inclusion criterion.

Although we cannot prove that our training dataset was completely free of all social biases, our analysis suggests that the predictive model may be less subject to bias relative to mortality estimates performed by clinical providers, who gave African American and Asian American patients higher risks of death than White patients, and non-English-speaking patients higher risks of death than English-speaking patients. This study suggests that while predictive models can undoubtedly incorporate bias and exacerbate disparities,[41] they may also be used to counteract bias and alleviate disparities. The inpatient prognosis prediction model provides a robust tool to assist clinical providers in evaluating mortality risk and this study was an important step in launching the model in a clinical inpatient setting.

In parallel with this model, we have also created a monitoring dashboard to track its performance, investigate its stability across demographic groups, and observe whether major categorical and numeric features comprising the model drift over time ([Supplementary Fig. S4] [available in the online version]). Using this tool, we have found that the model performance has been extremely stable by comparing current performance with historical data over the past 5 years. We have also been evaluating the baseline mortality rate over time because changes due to new cancer therapies could impact model performance and found this metric has also been stable for the duration of the study.


Conclusion

Future directions for this work include evaluating the model performance over a longer time interval to determine how incorporating the model impacts patient care. For instance, the ability to predict life expectancy within a 6-month time interval may help patients qualify for hospice care benefits from Medicare and other insurers.[42] Model performance in a larger and more diverse patient cohort also requires evaluation and a further exploration of clinician biases through surveys or qualitative interviews of patients and providers would be insightful. Another important downstream consideration is how clinicians incorporate the model's predictions into their clinical practice. The model levels the field so that every clinician has the same information about the patient's prognosis, regardless of skill or clinical experience. Evaluating how this information is used to care for our patients and whether services are provided equally across demographic groups is another important future direction.

Generalizing this model to additional clinical services and other health care environments is another promising future direction that may be feasible, since the clinical and demographic features underlying the model are not specific for cancer patients. Understanding whether clinicians perform better at predicting deaths from disease progression relative to secondary issues such as infections and other complications would also be of interest. Development of a model that predicts survival in outpatients might help identify additional individuals who could benefit from end-of-life supportive care and resources.


Clinical Relevance Statement

We have built a statistical model that uses laboratory and admission data to predict mortality risk in cancer patients. It has been implemented to assist clinical providers in guiding patients toward end-of-life care planning.


Multiple Choice Questions

  1. If the statistical model described in the article predicts that a patient is at high risk and the clinical provider believes the patient has a low probability of dying at 45 days, then the best practice is to:

    • Allow the predictive models to initiate an advanced illness support order set because the model was shown to be more accurate than clinical judgement. Initiating the order set will save time for clinical providers with minimal risk for patients.

    • Allow the clinician to decide whether the advanced illness support order set and goals of care conversations are appropriate in light of the statistical model prediction and their own judgement.

    • Suppress the information provided by the statistical model because conflicting perspectives will confuse other members of the clinical team as well as the patient and their family.

    • Assume the statistical model is correct because clinical providers are more likely to underestimate mortality risk, so patients who would benefit from advanced illness support orders and goals of care conversations might be missed.

    Correct Answer: The correct answer is option b. The statistical model is best used to assist providers in by helping them integrate complex data, but this tool is not intended supplant clinical judgment.

  2. An advantage of using a statistical model over a machine learning model is:

    • Mathematical relationships between input variables are clearly defined in a statistical model.

    • Statistical models can handle a greater number of inputs relative to machine learning models.

    • Statistical models can integrate a wide variety of categorial and numerical data inputs whereas the data inputs are more limited for machine learning models.

    • Traditional statistical models outperform clinical judgement more frequently than machine learning models.

    Correct Answer: The correct answer is option a. Traditional statistical models define the mathematical relationships between variables, whereas the output from machine learning models are not always clearly explainable.



Conflict of Interest

None declared.

Acknowledgments

Andrew Zarski for support in building the clinical trial online survey. Sarah McCaskey for early clinical guidance. Natalia Summervile for data science expertise and project support. Nicole DiBacco, Brittany Gross-Jolly, Julie Lee, Vanessa Rodriguez, William Rosa, and Kenneth Rosenblatt for their clinical communications expertise. Josiah Chung for work on the model monitoring dashboard. For their work to put this model in production: Christine Fitzpatrick, Kimberly Gould, Gregory Jordan, Bigyan K C, Rashmi Kashyap, Jessica Kochan, Linda Li, Ian Morgan, Reshma Nevrekar, Ron Pearson, Jithin Thomas, Yuri Turin, Surendranatha Reddy Vellipalem, and Everett Weiss.

Protection of Human and Animal Subjects

The study was reviewed by the institutional review board of Memorial Sloan Kettering Cancer Center and is in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects.


* Shared first authorship.


Supplementary Material


Address for correspondence

Adrianna Z. Herskovits, MD, PhD
Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center
1275 York Ave, New York, NY, 10065
United States   

Publication History

Received: 15 December 2023

Accepted: 29 April 2024

Article published online:
26 June 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom
Fig. 1 Study design. Schematic showing the exclusion criteria for the dataset used in all the analysis.
Zoom
Fig. 2 Top 20 features incorporated into the predictive model. All biochemical, biometric, and demographic data were collected at the time of hospital admission and risk coefficients predictive of mortality in the subsequent 45-day period were calculated. The x-axis represents both the direction and value of the coefficients. Negative risk coefficients are protective and shown in green, and coefficients with a higher absolute value indicate a greater degree of protection. Positive risk coefficients are shown in red, with a higher absolute value indicating a greater risk of mortality.
Zoom
Fig. 3 Calibration plots depicting the performance of (A) clinicians overall (hatched line), (B) machine learning regression model (hatched line), and (C) clinicians who had known patients for at least a week (hatched line). In these plots, the solid line represents ideal calibration with perfect concordance between predicted and actual patient outcomes. Deviation from the solid line represents the degree of miscalibration.
Zoom
Fig. 4 Decision curve analysis. Orange line: intervene according to the model. Green line: intervene according to clinician prediction. Blue line: intervene on all patients. Red line: intervene on no patient.
Zoom
Fig. 5 Integration of prognosis of admission tool. (A) Screenshot of prognosis at admission tool in the medical record. (B) Example of email generated for high-risk patients identified by the algorithm recommending advanced illness support order set and goals of care conversations (data presented in Fig. 5 are imaginary).