Methods Inf Med 2021; 60(05/06): 147-161
DOI: 10.1055/s-0041-1735620
Original Article

Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Asa Adadey
1  Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States
,
Robert Giannini
1  Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States
,
Lorraine B. Possanza
1  Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States
› Author Affiliations

Abstract

Background Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.

Objectives The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.

Methods Datasets for 14 different labels (themes) were vectorized using a bag-of-words, tf-idf, or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.

Results F1 score for each optimized model ranged from 0.951 (“Fall”) to 0.544 (“Environment”). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.

Conclusions We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.

Ethical Approval

No human and/or animal subjects were involved in this research.


Supplementary Material



Publication History

Received: 12 November 2020

Accepted: 05 August 2021

Publication Date:
31 October 2021 (online)

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany