Summary
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Big Data and Analytics in Healthcare”.
Objectives: This paper describes the scale-up efforts at the VA Salt Lake City Health Care System
to address processing large corpora of clinical notes through a natural language processing
(NLP) pipeline. The use case described is a current project focused on detecting the
presence of an indwelling uri-nary catheter in hospitalized patients and subsequent
catheter-associated urinary tract infections.
Methods: An NLP algorithm using v3NLP was developed to detect the presence of an indwelling
urinary catheter in hospitalized patients. The algorithm was tested on a small corpus
of notes on patients for whom the presence or absence of a catheter was already known
(reference standard). In planning for a scale-up, we estimated that the original algorithm
would have taken 2.4 days to run on a larger corpus of notes for this project (550,000
notes), and 27 days for a corpus of 6 million records representative of a national
sample of notes. We approached scaling-up NLP pipelines through three techniques:
pipeline replication via multi-threading, intra-annotator threading for tasks that
can be further decomposed, and remote annotator services which enable annotator scale-out.
Results: The scale-up resulted in reducing the average time to process a record from 206 milliseconds
to 17 milliseconds or a 12-fold increase in performance when applied to a corpus of
550,000 notes.
Conclusions: Purposely simplistic in nature, these scale-up efforts are the straight forward evolution
from small scale NLP processing to larger scale extraction without incurring associated
complexities that are inherited by the use of the underlying UIMA framework. These
efforts represent generalizable and widely applicable techniques that will aid other
computationally complex NLP pipelines that are of need to be scaled out for processing
and analyzing big data.
Keywords
Natural language processing - big data - scale-up