Subscribe to RSS
DOI: 10.1055/a-2411-5796
Enhancing Suicide Attempt Risk Prediction Models with Temporal Clinical Note Features
Authors
Funding This research has been supported by several funding bodies. The primary source of funding was the National Library of Medicine (NLM) T15 training grant (grant number: 2T15LM007450-20). Additional support came from the Evelyn Selby Stead Fund for Innovation, Vanderbilt University Medical Center, specifically grants R01 MH121455 and R01 MH116269. The Military Suicide Research Consortium also provided funding through grant W81XWH-10-2-0181. Finally, funding for the Research Derivative and BioVU Synthetic Derivative was provided by the National Center for Research Resources (grant number: UL1 RR024975/RR/NCRR). The funders had no role in study design, data collection and analysis, or manuscript preparation.
Abstract
Objectives The objective of this study was to investigate the impact of enhancing a structured-data-based suicide attempt risk prediction model with temporal Concept Unique Identifiers (CUIs) derived from clinical notes. We aimed to examine how different temporal schemes, model types, and prediction ranges influenced the model's predictive performance. This research sought to improve our understanding of how the integration of temporal information and clinical variable transformation could enhance model predictions.
Methods We identified modeling targets using diagnostic codes for suicide attempts within 30, 90, or 365 days following a temporally grouped visit cluster. Structured data included medications, diagnoses, procedures, and demographics, whereas unstructured data consisted of terms extracted with regular expressions from clinical notes. We compared models trained only on structured data (controls) to hybrid models trained on both structured and unstructured data. We used two temporalization schemes for clinical notes: fixed 90-day windows and flexible epochs. We trained and assessed random forests and hybrid long short-term memory (LSTM) neural networks using area under the precision recall curve (AUPRC) and area under the receiver operating characteristic, with additional evaluation of sensitivity and positive predictive value at 95% specificity.
Results The training set included 2,364,183 visit clusters with 2,009 30-day suicide attempts, and the testing set contained 471,936 visit clusters with 480 suicide attempts. Models trained with temporal CUIs outperformed those trained with only structured data. The window-temporalized LSTM model achieved the highest AUPRC (0.056 ± 0.013) for the 30-day prediction range. Hybrid models generally showed better performance compared with controls across most metrics.
Conclusion This study demonstrated that incorporating electronic health record-derived clinical note features enhanced suicide attempt risk prediction models, particularly with window-temporalized LSTM models. Our results underscored the critical value of unstructured data in suicidality prediction, aligning with previous findings. Future research should focus on integrating more sophisticated methods to continue improving prediction accuracy, which will enhance the effectiveness of future intervention.
Background and Significance
The Centers for Disease Control and Prevention reported that in 2021 approximately 48,000 people in the United States died of suicide.[1] Proven interventions such as psychiatric medication and removing access to firearms might help individuals at-risk for suicide.[2] Barriers to identifying at-risk individuals and delivering timely prevention efforts include limited access to mental health services, social stigma surrounding mental health, insufficient training for health care providers in recognizing suicide risk, and the fragmented nature of health records.[3] Informatics efforts have been underway to address the growing need for improved screening and treatment of at-risk individuals.[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Much of this work has explored the interplay of structured and unstructured electronic health record (EHR) data for clinical predictive machine learning tasks.[9] [13] [19] [20] [21] [22] [23] [24] [25]
In mental health assessments, information that isn't neatly organized in fields (like free-text notes from clinicians) plays a crucial role, especially when assessing suicide risk.[9] [11] [18] [26] Broadly, researchers have used this so-called “unstructured data” found in Reddit comments to detect suicidal ideation.[27] Likewise, more centrally to health care, the Veterans Health Administration demonstrated that sentiment analysis fueled by unstructured data in EHR clinic notes enhances suicidality prediction accuracy.[9] In previous work, we leveraged Concept Unique Identifier (CUI) counts from clinical notes to generate suicidality risk factor networks.[28] Techniques that analyze the natural language used in clinical notes (called NLP or Natural Language Processing) have been shown to improve the detection of suicidal thoughts in pregnant women, predict suicide risk after hospital discharge, and even perform better than mental health professionals in identifying suicide risk from notes.[11] [12] [18] This represents a clinically significant milestone in that the perinatal period represents a time of elevated risk for suicide among this vulnerable population.[18] The lack of unstructured data has also been noted as a serious problem that limits the utility in health care. Indeed, Boggs et al reported substantial gaps in follow-up assessments related to suicidal ideation due to the lack of structured EHR data on suicidality.[26] Together, these advancements underscore the potential of unstructured data in enhancing predictive models for suicide risk.
Researchers often avoid including the timing of events (temporality) in their models to keep the analysis simpler and to reduce the risk of the model becoming too tailored to the specific data (a problem known as overfitting).[29] However, recent research advocates for incorporating temporal elements in clinical models to enhance performance.[30] [31] [32] [33] [34] Indeed, the ideation-to-action framework suggests that only after interacting with acquired capability to inflict painful harm upon oneself (via lowered fear of death and increased pain tolerance) does suicidal ideation progress toward suicide attempts.[35] [36] [37] [38] [39] The order (i.e., relative temporal precedence) of these constructs is integral to the ideation-to-action framework. Integrating this theory into prediction models supports the temporalization of input data to reflect critical indicators along the ideation-to-action continuum.
Objective
The objective of this study was to investigate the impact of extending a validated structured-data-based suicide attempt risk prediction model with CUIs derived from clinical notes and to examine how temporal schemes, model types, and different prediction ranges affected predictive performance. Building upon previous research in suicide risk prediction, this research aimed to enhance our understanding of how clinical variable selection, transformation, and periodization could influence model outcomes.[9] [11] [12] [13] [18] [19] [40] [41] Specifically, we sought to leverage detailed clinical note data, enriched with temporal information, to develop a model that improves prediction of rare outcomes like suicide-related behaviors.[42] We compared the model with other baselines on area under the precision recall curve (AUPRC), sensitivity, positive predictive value (PPV), and risk stratification capability beyond prior models lacking these enriched features.
Methods
Study Setting
Vanderbilt University Medical Center (VUMC) operates a large regional health care network with over 1,000 beds across multiple facilities and manages over 1.5 million outpatient visits and 40,000 inpatient admissions annually. VUMC also includes a dedicated psychiatric facility to provide comprehensive mental health care services. The patient population includes urban and rural communities with diverse demographic and health care experiences, which is vital for research initiatives like suicide prevention. The Vanderbilt Suicide Attempt and Ideation Likelihood (VSAIL) model generates patient encounter suicide risk scores, using structured EHR data.[4] [5] It has been validated retrospectively, prospectively, and in the context of universal screening in high-acuity clinical settings.[4] [43] [44] A decision support module driven by VSAIL has been evaluated via randomized controlled trial and shown to increase face-to-face suicide risk assessment.[45]
Cohort, Clusters, and Outcomes
We analyzed adult outpatient and emergency department encounters at VUMC between January 1, 2010, and December 31, 2022. We grouped sequences of visits with gaps of 3 or fewer days into discrete clusters. Thus, patients could appear multiple times in the dataset if they visited VUMC care centers on separate occasions more than 3 days apart. We determined modeling targets (outcomes) by the presence or absence of at least one ICD diagnostic code for suicide attempt within 30, 90, or 365 days (prediction ranges) following a visit cluster. [Supplementary Material S1] (available in the online version) details the 1,526 ICD codes used to ascertain suicide-attempt—included within these codes are only diagnoses which explicitly indicate suicidal intent or intent to inflict lethal self-harm/injury. In 2015, health systems converted billing diagnostic codes from 9th revision of International Classification of Diseases (ICD-9) to 10th revision of International Classification of Diseases (ICD-10). Our previous work has shown ICD-10 diagnostic codes to have higher PPV (0.85)[6] for suicide attempt ascertainment than ICD-9 codes (PPV 0.58).[5] Early experimentation supported the inclusion of ICD-9 era data into the training set, while ensuring that only ICD-10 data were used for evaluation. We excluded visits occurring within 3 days of a suicide attempt to avoid inadvertently including predictions for visits initiated by a suicide event, i.e., visits in which prediction would not be necessary.
Features and Measurements
We collected structured and unstructured data from a 5-year window preceding each episode. Structured data, based on the original VSAIL model, included medications (n = 693), diagnoses (n = 83), procedures (n = 46), and demographics (n = 3). We imputed missing values with constant zeros and missing labels. Features were scaled to a maximum absolute value of 1 without shifting or centering. Structured features were not temporally aggregated, as preliminary analysis showed low temporal feature variance, a finding also supported by prior work by Shortreed et al.[46]
From unstructured clinical notes, we extracted note-level count aggregations of medical concepts using 13,997 CUIs from the Unified Medical Language System.[47] CUI extraction was performed with the VUMC Wordcloud Indexer, a negation-aware regular-expressions-based NLP tool.[48] We processed CUIs using two temporalization schemes: 21 fixed, 90-day windows (window scheme) and 21 flexible epochs grouping note activity with gaps of 30 or fewer days (epoch scheme). CUI counts were normalized using term frequency-inverse document frequency (TF-IDF)[49] transformation to emphasize terms that occur less frequently and reduced with latent semantic analysis (LSA)[50] into 100 components to reduce the high-dimensional data into fewer components of greater information density. The LSA transformer was first fitted on the 5-year aggregate of CUI counts before transforming individual windows and epochs, allowing for consistent comparison across temporal schemes.
Experimental Overview
We trained random forests and hybrid long short-term memory (LSTM) neural networks to test the impact of input temporality on suicide attempt risk prediction. Models trained on nontemporal structured features were compared with models trained on nontemporal structured features plus temporally aggregated CUIs [Supplementary Material S2] (available in the online version). We performed experiments with three modeling groups: random forest structured only (control), random forest structured plus temporal CUIs, and LSTM structured plus temporal CUIs. We used two CUI temporalization schemes: windows and epochs, and three prediction ranges: 30, 90, and 365 days. Thus, our experimental variations included random-forest-control (VSAIL), random-forest-epoch, random-forest-window, LSTM-epoch, and LSTM-window models across each prediction range.
We divided encounters using a mixed temporal split design. The training set consisted of the earliest 80% of visit clusters (January 1, 2010–September 2, 2021), whereas the latest records were reserved for testing, ensuring that only ICD-10 era outcomes were included in the test set for evaluation with more recent data. We randomly assigned the remaining 20% of visit clusters to either development or testing in a 1:4 ratio, resulting in an 80/4/16 training/development/testing split ([Fig. 1]).


Model Implementation
We created random forest models using scikit-learn 1.2.0 for preprocessing and classification, with a custom pipeline for efficient hyperparameter selection.[51] These models were developed and evaluated in Python 3.8.0, managed with the built-in virtual environment venv. We created hybrid LSTM models using PyTorch 2.2.2 (CPU) with a custom neural network module to handle a mixture of temporal (CUI counts) and nontemporal (VSAIL) features.[52] The model has two input layers: an LSTM input layer for the temporal CUI count features and a dense linear input layer for the nontemporal VSAIL features, which are combined to feed into a single output prediction layer. [Fig. 2] depicts the hybrid LSTM model architecture in detail. [Supplementary Material S3] (available in the online version) provides additional modeling details.


Model Training
We used 10-fold stratified, grouped cross-validation for hyperparameter selection. Outcome stratification ensured each fold had nearly equal numbers of suicide attempts, while grouping prevented visit clusters from the same individual from being placed in different folds, reducing overfitting risk. We selected hyperparameters via a two-stage grid search. Initial ranges were geometrically distributed by a ratio of 2 (e.g., 4, 8, 16, 32). A second search used a linear distribution of values centered around the best-performing initial hyperparameters. We trained hybrid LSTM models with early stopping, continuing until cross-validation performance declined. We determined the best hyperparameters using mean cross-validation AUPRC. The best model was calibrated to the prevalence of suicide attempts in the development set using Platt's method.[53]
Evaluation
We assessed general performance on the final test set using AUPRC and area under the receiver operating characteristic (AUROC). We measured sensitivity and PPV at 95% specificity to characterize each model's potential cost-effectiveness as a screening tool, as described by Ross et al.[54] We used bootstrapping with 1,000 iterations to generate 95% confidence intervals for all metrics and applied one-sided Wilcoxon rank-sum tests to compare scores. We evaluated risk stratification by counting true positives within probability deciles, providing a visit-centered view of model improvements in identifying future suicide attempts. We measured model calibration [Supplementary Material S4] (available in the online version) on the development set before and after adjustment using Spiegelhalter's z-score.[55]
To quantify clinical note feature importances, we used random-forest-derived mean decrease in impurity (MDI). Given the transformation of temporal CUI counts with LSA, direct comparison of each temporalized CUI feature's importance was challenging. Therefore, we created an additional random forest model trained entirely on nontemporal CUI counts (13,997 features) that were only TF-IDF transformed. We averaged the MDI from 100 bootstrapped variations of this model to calculate approximate feature importances and 95% confidence intervals for our CUIs, repeated for each prediction window.[56]
Results
The training set contained 2,364,183 visit clusters and 2,009 30-day attempts. The development set contained 118,534 visit clusters and 117 30-day attempts. The testing set contained 471,936 visit clusters and 480 30-day attempts. The overall prevalence of 30-day attempts was approximately 1 in 1,000 (0.1%). [Table 1] describes the demographics of the entire study cohort. [Fig. 3] shows the distribution of health care utilization by demographics. The distribution of visit cluster counts per patient was similar among individuals without recorded suicide attempts across racial, ethnic, and gender groups, with Hispanic, Middle Eastern/North African, and Black individuals trending slightly higher than the rest. The distributions were higher among those with recorded suicide attempts, across all demographic groups. The highest distributions were among Hispanic, Middle Eastern/North African, American Indian/Native Alaskan, and Black individuals with recorded suicide attempts.
Notes: This table summarizes the overall cohort demographics of the study, including suicide rates within each demographic. Suicide rates are divided into four groups, indicating either (1) any suicide attempt or (2–4) only suicide attempts within a fixed number of days after a visit. The total column indicates the total number of patients within each demographic. The utilization column (Util.) indicates the mean fraction of health care visit clusters per patient within each demographic.


The distribution of bootstrapped AUPRC, AUROC, sensitivity at 95% specificity, and PPV at 95% specificity evaluation metrics are shown in [Figs. 4] [5] [6] [7], respectively. [Table 2] provides the exact means and 95% confidence intervals for each evaluation metric. As the prediction range increased (i.e., from 30 to 90 to 365 days), AUPRC and PPV increased, whereas AUROC and sensitivity decreased. Window temporalization schemes outperformed epochs across all four metrics, except in the case of LSTM with a 30-day prediction window, where the epoch scheme resulted in higher AUPRC. LSTMs performed better than random forests only in terms of AUPRC. Hybrid models (epochs and windows) outperformed controls in every metric except for PPV. In terms of our primary ranking metric (AUPRC) and primary use case (30-day prediction range), the highest performing model was the window-temporalized LSTM model (0.056 ± 0.016), followed by LSTM-epoch (0.041 ± 0.010), random-forest-window (0.036 ± 0.008), random-forest-epoch (0.028 ± 0.006), and control (0.015 ± 0.003). These rankings were confirmed by the Wilcoxon rank-sum tests (p < 0.001).
Abbreviations: AUPRC, area under the precision recall curve; AUROC, area under the receiver operating characteristic; LSTM, long short-term memory.
Notes: This table summarizes the metric averages and 95% confidence intervals (CIs) for each model variation and prediction range across 1,000 bootstrapped evaluation iterations. CIs are calculated using the average of the 5th and 95th percentile score differences from the mean. Sensitivity (Sn.) and positive predictive value (PPV) are reported at 95% specificity (Sp.).








[Fig. 8] depicts the stratification of suicide attempts within prediction deciles, organized by model type, temporalization scheme, and prediction range. Increasing prediction ranges increased the number of suicide attempts but decreased the fraction of suicide attempts captured in the 10th prediction decile of each model. In the 30-day prediction range 10th-decile stratification, the rankings were: random-forest-window (90.6%), random-forest-epoch (87.1%), LSTM-window (80.0%), LSTM-epoch (75.8%), and control (72.7%). The gap in stratification performance between LSTM-window and random-forest-window increased in the 90-day prediction range (74.3 vs. 80.0%) and the 365-day prediction range (71.0 vs. 77.8%).


[Fig. 9] shows the top 20 feature importances by MDI across 100 bootstrap iterations for the 30-day prediction range. The top five most important features were suicide attempt, feeling hopeless, self-injurious behavior, active suicidal ideation, and impaired judgment. [Fig. 10] compares the relative scaled importances by MDI of the top 10 features for all three prediction ranges. The 365- and 90-day prediction ranges showed a tighter cluster of feature importances, whereas the 30-day prediction range showed greater variance in feature importances.




Discussion
In this study, we demonstrated how EHR-derived clinical note features improve a deployed suicide attempt risk prediction model.[45] We also showed that both the clinical note data temporalization scheme and model type significantly impact model performance across various testing dimensions and prediction ranges. The importance of temporality echoes findings discussed by the ideation-to-action framework, wherein proximal risk factors influence the ability of suicidal ideation to progress toward or away from suicide attempts. That is, the order in which people experience the intensification or easing of risk factors for suicide ideation or attempts influence the presence or absence of suicide attempts. In clinical practice, this may appear as ambivalence regarding the desire for death or the intent to enact a suicide attempt. Although the model employed in the present project was not built to depend on or replicate theories of suicide, the fact that the model benefits from temporality offers support for theorists who in their own right are seeking to understand the causes of suicidal behavior.[38] [39]
Within the clinically preferred 30-day prediction range, window-temporalized LSTM models achieved the highest test AUPRC, whereas random forests achieved the highest test sensitivity and PPV at 95% specificity. AUPRC is more reliable than setpoint metrics for evaluation until the clinical burden of this model can be studied.[54] Models that included features from free-text clinical notes (such as specific medical terms related to suicidality and mental health) outperformed those trained purely on structured data, and the most impactful clinical terms were related to suicidality, mental health, depression, social stress, and drug use, complementing structured features.[57]
This work builds on the efforts of others by comparing different approaches to incorporating temporal features in suicide attempt risk prediction models and highlighting the effectiveness of using unstructured data from clinical notes. In comparing our methods and results to those of Shortreed et al, for example, several key differences and outcomes emerge.[46] While both studies explored the use of added temporal features for suicide attempt risk prediction, our study demonstrated improved performance with features derived from clinical notes, including TF-IDF and LSA-transformed clinical concepts. In contrast, Shortreed et al did not observe significant performance improvements with added temporal predictors engineered from clinical data. Unstructured free text might better capture temporal data relevant to suicide attempt risk prediction than structured data. Replication studies of temporal features for both structured and unstructured data are indicated.
Our findings align with the prior work of Tsui et al, who also found that incorporating unstructured data significantly enhanced their prediction model's accuracy compared with using only structured data.[13] Currently, we use a bag-of-words approach for suicide attempt risk prediction based on medical concept counts from clinical notes. Although not theory-driven, this method offers easy accessibility, fast implementation, and scalability. In contrast, Meerwijk et al advocated for a theory-driven approach using the three-step Theory of Suicide (3ST), which, while potentially more accurate, required extensive setup and manual annotation.[14] [36] Overall, our bag-of-words model remains a feasible and effective method until more refined strategies become practical.
In future work, we aim to enhance our risk prediction model using advanced NLP techniques, including vector embeddings like word2vec and cui2vec, which capture term similarities within the corpus.[21] [22] [58] Coppersmith et al and Ji highlighted the effectiveness of pretrained embeddings and large language models such as BERT, RoBERTa, MentalBERT, and MentalRoBERTa in fine-tuning suicide text classifiers.[10] [15] [58] We also suspect it may be prudent to retrain with additional NLP-derived features like lexical, syntactic, and sentimental elements, shown to improve outcomes in suicide note classification.[12] Levis et al suggested sentiment analysis of psychotherapy notes can improve prediction models, whereas Ji noted varying effectiveness depending on the data source.[9] [15] [16] [59] [60] [61] Given the reliance on clinician-entered ICD codes in the present study, which may under-identify suicide attempts, especially those presented at an outside facility,[7] future work may adopt a weakly supervised NLP approach as used by Bejan et al.[6] Importantly, our current study improved performance using a simple, understandable, fast, and more transportable method, underscoring the critical value of unstructured data in suicidality prediction. This highlights that even straightforward approaches can make significant contributions, paving the way for further enhancements with more advanced techniques.
In summary, the field of predictive modeling is steadily incorporating unstructured data and NLP methods to improve screening efforts. Our work, and the works of others discussed in this paper, support the integration of these complex data sources into risk prediction models. There are several interesting paths for future research, such as the use of vector embeddings, better ground-truth ascertainment methods, improved feature extraction techniques, additional sentiment analysis, and the use of theory-based approaches to inform model design. Differences in study outcomes could also stem from variations in datasets, model implementations, and population characteristics, which should be considered in future comparisons. These potential improvements show promise for better predicting suicide risk, which could lead to more effective interventions and fewer deaths from suicide. The challenge moving forward is finding ways to continually improve, fine-tune, and validate these models.
Clinical Relevance Statement
This study highlights the significant improvement in suicide attempt risk prediction models when incorporating temporal CUIs derived from clinical notes. By enhancing structured data with temporally organized unstructured data, particularly through window-temporalized LSTM models, predictive performance notably increased. These findings underscore the importance of utilizing both structured and unstructured EHR data in clinical risk assessments. Improved prediction models can lead to more accurate identification of high-risk individuals, potentially allowing for timely and targeted interventions. Future advancements in integrating sophisticated methods with clinical data hold promise for further enhancing predictive accuracy and ultimately improving patient outcomes.
Multiple-Choice Questions
-
Which of the following approaches to ascertaining suicide attempts in health records should have the highest PPV?
-
Analyzing ICD-9 codes
-
Analyzing ICD-10 codes
-
Employing weakly supervised NLP
-
Manual chart review by expert clinicians
Correct Answer: The correct answer is option d. ICD-10 codes have higher PPV for ascertaining suicide attempt than ICD-9, and novel weakly supervised NLP approaches show promise for further improving PPV. However, the gold standard to which all approaches are currently compared to is clinical chart review.
-
-
Which of the following evaluation metrics is prioritized to address the rarity of outcomes in this study?
-
AUROC
-
Accuracy
-
AUPRC
-
PPV
Correct Answer: The correct answer is option c. AUROC, accuracy, and PPV can all be artificially inflated with rare outcomes.
-
-
Which of the following is a foundational framework for studying suicidality?
-
Ideation to Action (I2A)
-
Spontaneous Action (SA)
-
Hopelessness and Loneliness (HaL)
Correct Answer: The correct answer is option a. The leading theories on suicidality are collectively referred to as ideation to action frameworks.
-
Conflict of Interest
None declared.
Acknowledgments
The authors would like to express their sincere gratitude to Dario A. Giuse, the Director of Vanderbilt Health IT, for generously providing access to the Vanderbilt Wordcloud Indexer. This tool was invaluable in facilitating the research and contributing to the conclusions drawn in this work. The author also acknowledges the support from their team, peers, and the wider research community, whose collaboration and inputs have enriched this study.
Protection of Human and Animal Subjects
No human subjects were involved in this project.
Note
The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was reviewed by the VUMC Institutional Review Board.
-
References
- 1 Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System,. Mortality 2018–2021 on CDC WONDER Online Database; 2021
- 2 Zalsman G, Hawton K, Wasserman D. et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry 2016; 3 (07) 646-659
- 3 Mann JJ, Apter A, Bertolote J. et al. Suicide prevention strategies: a systematic review. JAMA 2005; 294 (16) 2064-2074
- 4 Walsh CG, Johnson KB, Ripperger M. et al. Prospective validation of an electronic health record-based, real-time suicide risk model. JAMA Netw Open 2021; 4 (03) e211428
- 5 Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5 (03) 457-469
- 6 Bejan CA, Ripperger M, Wilimitis D. et al. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci Rep 2022; 12 (01) 15146
- 7 Young J, Bishop S, Humphrey C, Pavlacic JM. A review of natural language processing in the identification of suicidal behavior. J Affect Disord Rep 2023; 12: 100507
- 8 Cohen J, Wright-Berryman J, Rohlfs L, Trocinski D, Daniel L, Klatt TW. Integration and validation of a natural language processing machine learning suicide risk prediction model based on open-ended interview language in the emergency department. Front Digit Health 2022; 4: 818705
- 9 Levis M, Leonard Westgate C, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med 2021; 51 (08) 1382-1391
- 10 Coppersmith G, Leary R, Crutchley P, Fine A. Natural language processing of social media as screening for suicide risk. Biomed Inform Insights 2018; 10: 11 78222618792860
- 11 McCoy Jr TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73 (10) 1064-1071
- 12 Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010; 2010 (03) 19-28
- 13 Tsui FR, Shi L, Ruiz V. et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open 2021; 4 (01) ooab011
- 14 Meerwijk EL, Tamang SR, Finlay AK, Ilgen MA, Reeves RM, Harris AHS. Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study. BMJ Open 2022; 12 (08) e065088
- 15 Ji S. Towards intention understanding in suicidal risk assessment with natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics; 2022 :4028–4038. Accessed September 15, 2024 at: https://aclanthology.org/2022.findings-emnlp.297
- 16 Ji S, Yu CP, Fung S fu, Pan S, Long G. Supervised learning for suicidal ideation detection in online user content. Complexity 2018; 2018: 1-10
- 17 Arowosegbe A, Oyelade T. Application of natural language processing (NLP) in detecting and preventing suicide ideation: a systematic review. Int J Environ Res Public Health 2023; 20 (02) 1514
- 18 Zhong QY, Mittal LP, Nathan MD. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 2019; 34 (02) 153-162
- 19 Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20 (01) 280
- 20 Thompson K. Programming techniques: regular expression search algorithm. Commun ACM 1968; 11 (06) 419-422
- 21 Beam AL, Kompa B, Schmaltz A. et al. Clinical concept embeddings learned from massive sources of multimodal medical data. Pac Symp Biocomput 2020; 25: 295-306
- 22 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. ArXiv13013781. Accessed February 13, 2022 at: http://arxiv.org/abs/1301.3781
- 23 Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003; 3 (Jan): 993-1022
- 24 Dey L, Haque SKM. Opinion mining from noisy text data. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data - AND '08. ACM Press; 2008: 83-90
- 25 Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv:cs/0212032. Accessed February 13, 2022 at: http://arxiv.org/abs/cs/0212032
- 26 Boggs JM, Quintana LM, Powers JD, Hochberg S, Beck A. Frequency of clinicians' assessments for access to lethal means in persons at risk for suicide. Arch Suicide Res 2022; 26 (01) 127-136
- 27 Yeskuatov E, Chua SL, Foo LK. Leveraging reddit for suicidal ideation detection: a review of machine learning and natural language processing techniques. Int J Environ Res Public Health 2022; 19 (16) 10347
- 28 Krause KJ, Shelley J, Becker A, Walsh C. Exploring risk factors in suicidal ideation and attempt concept cooccurrence networks. AMIA Annu Symp Proc 2023; 2022: 644-652
- 29 Montesinos López OA, Montesinos López A, Crossa J. Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing;; 2022: 109-139
- 30 Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inform Decis Mak 2016; 16 (Suppl. 02) 71
- 31 Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling temporality of clinical events for drug safety surveillance. AMIA Annu Symp Proc 2015; 2015: 1371-1380
- 32 Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J Biomed Inform 2015; 53: 220-228
- 33 Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J, Doctor AI. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc 2016; 56: 301-318
- 34 Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Sci Rep 2018; 8 (01) 6085
- 35 Joiner TE. Why People Die by Suicide. Harvard University Press;; 2005
- 36 Klonsky ED, May AM. The three-step theory (3ST): a new theory of suicide rooted in the “ideation-to-action” framework. Int J Cogn Ther 2015; 8 (02) 114-129
- 37 Klonsky ED, May AM, Saffer BY. Suicide, suicide attempts, and suicidal ideation. Annu Rev Clin Psychol 2016; 12 (01) 307-330
- 38 Klonsky ED, Saffer BY, Bryan CJ. Ideation-to-action theories of suicide: a conceptual and empirical update. Curr Opin Psychol 2018; 22: 38-43
- 39 Van Orden KA, Witte TK, Cukrowicz KC, Braithwaite SR, Selby EA, Joiner Jr TE. The interpersonal theory of suicide. Psychol Rev 2010; 117 (02) 575-600
- 40 Schafer KM, Kennedy G, Gallyer A, Resnik P. A direct comparison of theory-driven and machine learning prediction of suicide: a meta-analysis. PLoS One 2021; 16 (04) e0249833
- 41 Walker RL, Shortreed SM, Ziebell RA. et al. Evaluation of electronic health record-based suicide risk prediction models on contemporary data. Appl Clin Inform 2021; 12 (04) 778-787
- 42 Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry 2017; 210 (06) 387-395
- 43 Wilimitis D, Turer RW, Ripperger M. et al. Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults. JAMA Netw Open 2022; 5 (05) e2212095
- 44 McKernan LC, Lenert MC, Crofford LJ, Walsh CG. Outpatient engagement and predicted risk of suicide attempts in fibromyalgia. Arthritis Care Res (Hoboken) 2019; 71 (09) 1255-1263
- 45 Walsh CG, Ripperger MA, Novak L. et al. Randomized controlled comparative effectiveness trial of risk model-guided clinical decision support for suicide screening. medRxiv 2024
- 46 Shortreed SM, Walker RL, Johnson E. et al. Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction. NPJ Digit Med 2023; 6 (01) 47
- 47 Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-D270
- 48 Mandani S, Giuse D, McLemore M, Weitkamp A. Augmenting NLP Results by Leveraging SNOMED CT Relationships for Identification of Implantable Cardiac Devices from Patient Notes. Presented at: SNOMED CT Expo 2019; October 31, 2019; Kuala Lumpur, Malaysia. Accessed September 15, 2024 at: https://confluence.ihtsdotools.org/display/FT/201905+Augmenting+NLP+results+by+leveraging+SNOMED+CT+relationships+for+identification+of+implantable+cardiac+devices+from+patient+notes?preview=/87042613/87043024/201905%20SCT%20Expo%202019%20-%20Madani.pdf
- 49 Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc 1972; 28 (01) 11-21
- 50 Landauer TK, Dumais ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 1997; 104 (02) 211-240
- 51 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825-2830
- 52 Paszke A, Gross S, Massa F. et al. PyTorch: an imperative style, high-performance deep learning library. 2019; . Accessed September 15, 2024 at:
- 53 Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 2000: 10
- 54 Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry 2021; 78 (06) 642-650
- 55 Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986; 5 (05) 421-433
- 56 Scornet E. Trees, forests, and impurity-based variable importance. ; 2021. Accessed May 16, 2022 at: http://arxiv.org/abs/2001.04295
- 57 Boggs JM, Beck A, Hubley S. et al. General medical, mental health, and demographic risk factors associated with suicide by firearm compared with other means. Psychiatr Serv 2018; 69 (06) 677-684
- 58 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics; 2014: 1532-1543
- 59 Sarsam SM, Al-Samarraie H, Alzahrani AI, Alnumay W, Smith AP. A lexicon-based approach to detecting suicide-related messages on Twitter. Biomed Signal Process Control 2021; 65: 102355
- 60 Gaur M, Aribandi V, Alambo A. et al. Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS. PLoS One 2021; 16 (05) e0250448
- 61 Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM; 2020: 105-114
- 62 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357
- 63 Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 2017; 18 (17) 1-5
Address for correspondence
Publication History
Received: 22 August 2023
Accepted: 05 September 2024
Accepted Manuscript online:
09 September 2024
Article published online:
18 December 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System,. Mortality 2018–2021 on CDC WONDER Online Database; 2021
- 2 Zalsman G, Hawton K, Wasserman D. et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry 2016; 3 (07) 646-659
- 3 Mann JJ, Apter A, Bertolote J. et al. Suicide prevention strategies: a systematic review. JAMA 2005; 294 (16) 2064-2074
- 4 Walsh CG, Johnson KB, Ripperger M. et al. Prospective validation of an electronic health record-based, real-time suicide risk model. JAMA Netw Open 2021; 4 (03) e211428
- 5 Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5 (03) 457-469
- 6 Bejan CA, Ripperger M, Wilimitis D. et al. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci Rep 2022; 12 (01) 15146
- 7 Young J, Bishop S, Humphrey C, Pavlacic JM. A review of natural language processing in the identification of suicidal behavior. J Affect Disord Rep 2023; 12: 100507
- 8 Cohen J, Wright-Berryman J, Rohlfs L, Trocinski D, Daniel L, Klatt TW. Integration and validation of a natural language processing machine learning suicide risk prediction model based on open-ended interview language in the emergency department. Front Digit Health 2022; 4: 818705
- 9 Levis M, Leonard Westgate C, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med 2021; 51 (08) 1382-1391
- 10 Coppersmith G, Leary R, Crutchley P, Fine A. Natural language processing of social media as screening for suicide risk. Biomed Inform Insights 2018; 10: 11 78222618792860
- 11 McCoy Jr TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73 (10) 1064-1071
- 12 Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010; 2010 (03) 19-28
- 13 Tsui FR, Shi L, Ruiz V. et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open 2021; 4 (01) ooab011
- 14 Meerwijk EL, Tamang SR, Finlay AK, Ilgen MA, Reeves RM, Harris AHS. Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study. BMJ Open 2022; 12 (08) e065088
- 15 Ji S. Towards intention understanding in suicidal risk assessment with natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics; 2022 :4028–4038. Accessed September 15, 2024 at: https://aclanthology.org/2022.findings-emnlp.297
- 16 Ji S, Yu CP, Fung S fu, Pan S, Long G. Supervised learning for suicidal ideation detection in online user content. Complexity 2018; 2018: 1-10
- 17 Arowosegbe A, Oyelade T. Application of natural language processing (NLP) in detecting and preventing suicide ideation: a systematic review. Int J Environ Res Public Health 2023; 20 (02) 1514
- 18 Zhong QY, Mittal LP, Nathan MD. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 2019; 34 (02) 153-162
- 19 Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20 (01) 280
- 20 Thompson K. Programming techniques: regular expression search algorithm. Commun ACM 1968; 11 (06) 419-422
- 21 Beam AL, Kompa B, Schmaltz A. et al. Clinical concept embeddings learned from massive sources of multimodal medical data. Pac Symp Biocomput 2020; 25: 295-306
- 22 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. ArXiv13013781. Accessed February 13, 2022 at: http://arxiv.org/abs/1301.3781
- 23 Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003; 3 (Jan): 993-1022
- 24 Dey L, Haque SKM. Opinion mining from noisy text data. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data - AND '08. ACM Press; 2008: 83-90
- 25 Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv:cs/0212032. Accessed February 13, 2022 at: http://arxiv.org/abs/cs/0212032
- 26 Boggs JM, Quintana LM, Powers JD, Hochberg S, Beck A. Frequency of clinicians' assessments for access to lethal means in persons at risk for suicide. Arch Suicide Res 2022; 26 (01) 127-136
- 27 Yeskuatov E, Chua SL, Foo LK. Leveraging reddit for suicidal ideation detection: a review of machine learning and natural language processing techniques. Int J Environ Res Public Health 2022; 19 (16) 10347
- 28 Krause KJ, Shelley J, Becker A, Walsh C. Exploring risk factors in suicidal ideation and attempt concept cooccurrence networks. AMIA Annu Symp Proc 2023; 2022: 644-652
- 29 Montesinos López OA, Montesinos López A, Crossa J. Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing;; 2022: 109-139
- 30 Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inform Decis Mak 2016; 16 (Suppl. 02) 71
- 31 Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling temporality of clinical events for drug safety surveillance. AMIA Annu Symp Proc 2015; 2015: 1371-1380
- 32 Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J Biomed Inform 2015; 53: 220-228
- 33 Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J, Doctor AI. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc 2016; 56: 301-318
- 34 Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Sci Rep 2018; 8 (01) 6085
- 35 Joiner TE. Why People Die by Suicide. Harvard University Press;; 2005
- 36 Klonsky ED, May AM. The three-step theory (3ST): a new theory of suicide rooted in the “ideation-to-action” framework. Int J Cogn Ther 2015; 8 (02) 114-129
- 37 Klonsky ED, May AM, Saffer BY. Suicide, suicide attempts, and suicidal ideation. Annu Rev Clin Psychol 2016; 12 (01) 307-330
- 38 Klonsky ED, Saffer BY, Bryan CJ. Ideation-to-action theories of suicide: a conceptual and empirical update. Curr Opin Psychol 2018; 22: 38-43
- 39 Van Orden KA, Witte TK, Cukrowicz KC, Braithwaite SR, Selby EA, Joiner Jr TE. The interpersonal theory of suicide. Psychol Rev 2010; 117 (02) 575-600
- 40 Schafer KM, Kennedy G, Gallyer A, Resnik P. A direct comparison of theory-driven and machine learning prediction of suicide: a meta-analysis. PLoS One 2021; 16 (04) e0249833
- 41 Walker RL, Shortreed SM, Ziebell RA. et al. Evaluation of electronic health record-based suicide risk prediction models on contemporary data. Appl Clin Inform 2021; 12 (04) 778-787
- 42 Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry 2017; 210 (06) 387-395
- 43 Wilimitis D, Turer RW, Ripperger M. et al. Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults. JAMA Netw Open 2022; 5 (05) e2212095
- 44 McKernan LC, Lenert MC, Crofford LJ, Walsh CG. Outpatient engagement and predicted risk of suicide attempts in fibromyalgia. Arthritis Care Res (Hoboken) 2019; 71 (09) 1255-1263
- 45 Walsh CG, Ripperger MA, Novak L. et al. Randomized controlled comparative effectiveness trial of risk model-guided clinical decision support for suicide screening. medRxiv 2024
- 46 Shortreed SM, Walker RL, Johnson E. et al. Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction. NPJ Digit Med 2023; 6 (01) 47
- 47 Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-D270
- 48 Mandani S, Giuse D, McLemore M, Weitkamp A. Augmenting NLP Results by Leveraging SNOMED CT Relationships for Identification of Implantable Cardiac Devices from Patient Notes. Presented at: SNOMED CT Expo 2019; October 31, 2019; Kuala Lumpur, Malaysia. Accessed September 15, 2024 at: https://confluence.ihtsdotools.org/display/FT/201905+Augmenting+NLP+results+by+leveraging+SNOMED+CT+relationships+for+identification+of+implantable+cardiac+devices+from+patient+notes?preview=/87042613/87043024/201905%20SCT%20Expo%202019%20-%20Madani.pdf
- 49 Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc 1972; 28 (01) 11-21
- 50 Landauer TK, Dumais ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 1997; 104 (02) 211-240
- 51 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825-2830
- 52 Paszke A, Gross S, Massa F. et al. PyTorch: an imperative style, high-performance deep learning library. 2019; . Accessed September 15, 2024 at:
- 53 Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 2000: 10
- 54 Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry 2021; 78 (06) 642-650
- 55 Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986; 5 (05) 421-433
- 56 Scornet E. Trees, forests, and impurity-based variable importance. ; 2021. Accessed May 16, 2022 at: http://arxiv.org/abs/2001.04295
- 57 Boggs JM, Beck A, Hubley S. et al. General medical, mental health, and demographic risk factors associated with suicide by firearm compared with other means. Psychiatr Serv 2018; 69 (06) 677-684
- 58 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics; 2014: 1532-1543
- 59 Sarsam SM, Al-Samarraie H, Alzahrani AI, Alnumay W, Smith AP. A lexicon-based approach to detecting suicide-related messages on Twitter. Biomed Signal Process Control 2021; 65: 102355
- 60 Gaur M, Aribandi V, Alambo A. et al. Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS. PLoS One 2021; 16 (05) e0250448
- 61 Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM; 2020: 105-114
- 62 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357
- 63 Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 2017; 18 (17) 1-5




















