Summary
Objective: Patient safety event data repositories have the potential to dramatically improve
safety if analyzed and leveraged appropriately. These safety event reports often consist
of both structured data, such as general event type categories, and unstructured data,
such as free text descriptions of the event. Analyzing these data, particularly the
rich free text narratives, can be challenging, especially with tens of thousands of
reports. To overcome the resource intensive manual review process of the free text
descriptions, we demonstrate the effectiveness of using an unsupervised natural language
processing approach.
Methods: An unsupervised natural language processing technique, called topic modeling, was
applied to a large repository of patient safety event data to identify topics, or
themes, from the free text descriptions of the data. Entropy measures were used to
evaluate and compare these topics to the general event type categories that were originally
assigned by the event reporter.
Results: Measures of entropy demonstrated that some topics generated from the un-supervised
modeling approach aligned with the clinical general event type categories that were
originally selected by the individual entering the report. Importantly, several new
latent topics emerged that were not originally identified. The new topics provide
additional insights into the patient safety event data that would not otherwise easily
be detected.
Conclusion: The topic modeling approach provides a method to identify topics or themes that may
not be immediately apparent and has the potential to allow for automatic reclassification
of events that are ambiguously classified by the event reporter.
Keywords
Patient safety event reports - topic model - latent dirichlet allocation - general
event type - natural language processing - unsupervised learning