Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Asa Adadey; Robert Giannini; Lorraine B. Possanza

doi:10.1055/s-0041-1735620

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Methods Inf Med 2021; 60(05/06): 147-161
DOI: 10.1055/s-0041-1735620

Original Article

Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Asa Adadey

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States

,

Robert Giannini

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States

,

Lorraine B. Possanza

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States

› Author Affiliations

› Further Information

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Abstract

Background Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.

Objectives The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.

Methods Datasets for 14 different labels (themes) were vectorized using a bag-of-words, tf-idf, or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.

Results F₁ score for each optimized model ranged from 0.951 (“Fall”) to 0.544 (“Environment”). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.

Conclusions We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.

Keywords

machine learning - natural language processing - patient safety - algorithms

Ethical Approval

No human and/or animal subjects were involved in this research.

Supplementary Material

Supplementary Material

Publication History

Received: 12 November 2020

Accepted: 05 August 2021

Article published online:
31 October 2021

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Hwang C-Y, Wu C-H, Cheng F-C, Yen Y-L, Wu K-H. A 12-year analysis of closed medical malpractice claims of the Taiwan civil court: a retrospective study. Medicine (Baltimore) 2018; 97 (13) e0237

Crossref PubMed Google Scholar
2 Santuzzi NR, Brodnik MS, Rinehart-Thompson L, Klatt M. Patient satisfaction: how do qualitative comments relate to quantitative scores on a satisfaction survey?. Qual Manag Health Care 2009; 18 (01) 3-18

Crossref PubMed Google Scholar
3 Boussat B, Kamalanavin K, François P. The contribution of open comments to understanding the results from the Hospital Survey on Patient Safety Culture (HSOPS): a qualitative study. PLoS One 2018; 13 (04) e0196089

Crossref PubMed Google Scholar
4 James JTA. A new, evidence-based estimate of patient harms associated with hospital care. J Patient Saf 2013; 9 (03) 122-128

Crossref PubMed Google Scholar
5 Makary MA, Daniel M. Medical error-the third leading cause of death in the US. BMJ 2016; 353: i2139

Crossref PubMed Google Scholar
6 Lawton R, McEachan RRC, Giles SJ, Sirriyeh R, Watt IS, Wright J. Development of an evidence-based framework of factors contributing to patient safety incidents in hospital settings: a systematic review. BMJ Qual Saf 2012; 21 (05) 369-380

Crossref PubMed Google Scholar
7 Pronovost PJ, Morlock LL, Sexton JB. et al. Improving the value of patient safety reporting systems. In: Henriksen K, Battles JB, Keyes MA, Grady ML. eds. Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 1: Assessment). Advances in Patient Safety.. Rockville, MD: Agency for Healthcare Research and Quality; 2008

Google Scholar
8 Mitchell I, Schuster A, Smith K, Pronovost P, Wu A. Patient safety incident reporting: a qualitative study of thoughts and perceptions of experts 15 years after 'To Err is Human'. BMJ Qual Saf 2016; 25 (02) 92-99

Crossref PubMed Google Scholar
9 Pronovost PJ, Thompson DA, Holzmueller CG. et al. Toward learning from patient safety reporting systems. J Crit Care 2006; 21 (04) 305-315

Crossref PubMed Google Scholar
10 Piotrowski MM, Saint S, Hinshaw DB. The Safety Case Management Committee. The Safety Case Management Committee: expanding the avenues for addressing patient safety. Jt Comm J Qual Improv 2002; 28 (06) 296-305

PubMed Google Scholar
11 Joshi MS, Anderson JF, Marwaha S. A systems approach to improving error reporting. J Healthc Inf Manag 2002; 16 (01) 40-45

PubMed Google Scholar
12 Benn J, Koutantji M, Wallace L. et al. Feedback from incident reporting: information and action to improve patient safety. Qual Saf Health Care 2009; 18 (01) 11-21

Crossref PubMed Google Scholar
13 Wang Y, Coiera E, Runciman W, Magrabi F. Using multiclass classification to automate the identification of patient safety incident reports by type and severity. BMC Med Inform Decis Mak 2017; 17 (01) 84

Crossref PubMed Google Scholar
14 Throop C, Stockmeier C. SEC & SSER Patient Safety Measurement System for Healthcare (2nd revision). Virginia Beach, VA: Healthcare Performance Improvement, LLC; 2011: 34

Google Scholar
15 Patterson ES, Anders S, Moffatt-Bruce S. Clustering and prioritizing patient safety issues during EHR implementation and upgrades in hospital settings. Proc Int Symp Hum Factors Ergon Healthc 2017; 6 (01) 125-131

Crossref PubMed Google Scholar
16 Chang A, Schyve PM, Croteau RJ, O'Leary DS, Loeb JM. The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events. Int J Qual Health Care 2005; 17 (02) 95-105

Crossref PubMed Google Scholar
17 Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 2010; 1 (1–4): 43-52

Crossref PubMed Google Scholar
18 Leskovec J, Rajaraman A, Ullman JD. eds. Data mining. In: Mining of Massive Datasets. 3rd ed.. Cambridge: Cambridge University Press; 2020: 1-19

Crossref Google Scholar
19 Le QV, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning Vol 32. JMLR: W&CP; 2014. Accessed May 20, 2021 at: http://arxiv.org/abs/1405.4053

PubMed Google Scholar
20 Govindan M, Van Citters AD, Nelson EC, Kelly-Cummings J, Suresh G. Automated detection of harm in healthcare with information technology: a systematic review. Qual Saf Health Care 2010; 19 (05) e11

PubMed Google Scholar
21 Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 2005; 12 (04) 448-457

Crossref PubMed Google Scholar
22 Penz JFE, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform 2007; 40 (02) 174-182

Crossref PubMed Google Scholar
23 Gerdes LU, Hardahl C. Text mining electronic health records to identify hospital adverse events. Stud Health Technol Inform 2013; 192: 1145

PubMed Google Scholar
24 Weller GB, Lovely J, Larson DW, Earnshaw BA, Huebner M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat Methods Med Res 2018; 27 (11) 3271-3285

Crossref PubMed Google Scholar
25 Zhou S, Kang H, Yao B, Gong Y. An automated pipeline for analyzing medication event reports in clinical settings. BMC Med Inform Decis Mak 2018; 18 (Suppl. 05) 113

Crossref PubMed Google Scholar
26 Fong A, Adams KT, Gaunt MJ, Howe JL, Kellogg KM, Ratwani RM. Identifying health information technology related safety event reports from patient safety event report databases. J Biomed Inform 2018; 86: 135-142

Crossref PubMed Google Scholar
27 Fong A, Komolafe T, Adams KT, Cohen A, Howe JL, Ratwani RM. Exploration and initial development of text classification models to identify health information technology usability-related patient safety event reports. Appl Clin Inform 2019; 10 (03) 521-527

Article in Thieme Connect PubMed Google Scholar
28 AHRQ Patient Safety Organization Program. Common formats. Agency for Healthcare Research and Quality (AHRQ). Accessed September 15, 2020 at: https://pso.ahrq.gov/common-formats

PubMed Google Scholar
29 Benin AL, Fodeh SJ, Lee K, Koss M, Miller P, Brandt C. Electronic approaches to making sense of the text in the adverse event reporting system. J Healthc Risk Manag 2016; 36 (02) 10-20

Crossref PubMed Google Scholar
30 Ong M-S, Magrabi F, Coiera E. Automated categorisation of clinical incident reports using statistical text classification. Qual Saf Health Care 2010; 19 (06) e55

Crossref PubMed Google Scholar
31 Perkins J. ed. Calculating high information words. In: Python 3 Text Processing with NLTK 3 Cookbook. 2 ed. Packt open source.. Birmingham: Packt Publishing; 2014: 214-219

Google Scholar
32 Zhang H. The optimality of naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. Menlo Park, CA: AAAI Press; 2004: 1-6

Google Scholar
33 Lau JH, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany. Stroudsburg, PA: ACL; 2016: 78-86

Google Scholar
34 Chai KEK, Anthony S, Coiera E, Magrabi F. Using statistical text classification to identify health information technology incidents. J Am Med Inform Assoc 2013; 20 (05) 980-985

Crossref PubMed Google Scholar
35 Kowsari JM, Heidarysafa M, Barnes B. Text classification algorithms: a survey. Information (Basel) 2019; 10 (04) 150

Crossref PubMed Google Scholar
36 Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; 2016: 785-794

Crossref Google Scholar
37 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; (12) 2825-2830

PubMed Google Scholar
38 Řehůřek R, Sojka P. Software framework for topic modelling with large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Paris: ELRA; 2010: 45-50

Google Scholar
39 Kessler J. Scattertext: a browser-based tool for visualizing how Corpora differ. In: Proceedings of ACL 2017, System Demonstrations. Stroudsburg, PA: ACL; 2017: 85-90

Crossref Google Scholar
40 Man Kwon Y, Hee Jun S, Mo Gal W, Jae Lim M. The performance comparison of the classifiers according to binary bow, count bow and Tf-Idf feature vectors for malware detection. Int J Eng Technol. 2018; 7 (3.33): 15-22

Crossref PubMed Google Scholar
41 Unified Medical Language System® (UMLS®): RxNorm. National Library of Medicine (NLM). Accessed September 15, 2020 at: https://www.nlm.nih.gov/research/umls/rxnorm/index.html

PubMed Google Scholar
42 LOINC®(Logical Observation Identifiers Names and Codes) - home page. Regenstrief Institute, Inc. Accessed September 15, 2020 at: https://loinc.org/

PubMed Google Scholar
43 SNOMED - Home | SNOMED International. SNOMED International. Accessed September 15, 2020 at: https://www.snomed.org/

PubMed Google Scholar

Supplementary Material

Supplementary Material

Subscribe to RSS

Share / Bookmark

Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Abstract

Keywords

Ethical Approval

Supplementary Material

Publication History

References