Keywords
Telephone - workflow - communication - organizational efficiency - VoIP
1. Background
Many researchers examine clinical workflows using one of two approaches [[1]]. One is direct visualization of clinical activities with human observers taking
notes as they watch clinicians go about their tasks [[2]–[9]]. The other is analysis of use patterns of pre-existing enterprise software such
as computerized physician order entry or nursing medication administration [[10]–[15]]. Both approaches have limitations. Manual observation is expensive, limiting data
collection to narrow pre-selected workflows. Analysis of a facility’s electronic medical
record environment necessarily reflects workflows from specific software and hardware
configurations rather than elective workflow choices made by clinicians.
Likewise, there are limited ways to identify clinical workflows at the scale of a
healthcare system with thousands of employees [[16]–[19]]. Answering the managerial question of how frontline nursing and patient care staff
spend their time is not easy [[20]–[25]]. This challenge becomes evident when hospitals embark on “Six Sigma” or “Lean”
re-engineering efforts and struggle to discover their current processes by asking
their employees to manually identify workflow inefficiencies [[26], [27]]. Instrumented workflow discovery tools such as RFID tagging and saccadic eye-movement
tracking have had limited success here [[28]–[30]].
2. Objectives
One promising workflow discovery assistance tool is computational analysis of electronic
log files recorded as a byproduct of using software for other tasks. Log files have
been used to analyze screen navigation, outlier events in laparoscopic surgery and
as targets for machine learning approaches to categorize clinical pathways [[12]–[14], [31]–[33]]. This paper focuses on the analysis of log files generated by making telephone
calls on a medical center’s communications server network.
Communication is intimately involved with clinical workflows [[34], [35]]. The adoption of “bring your own device” policies (allowing use of personal cell
phones and tablets) in many healthcare workplaces has increased awareness of communications
[[36]–[41]]. Identification of lapses in communication during “handoffs” has also generated
widespread interest [[42]–[49]]. Telephone calls are interesting to measure because they represent clinical tasks
in and of themselves and because these communications may serve as surrogate markers
for clinical activities which have no current electronic signature. Telephone calls
may also reflect rework when phone calls are placed to address failed processes. Of
note if there are frequent very short phone calls, the limited information transfers
within these short calls may allow identification of simple tasks which offer opportunities
for automation or process redesign, for example by replacing some calls with a workflow
engine or other communications technology [[50]–[52]].
The transition from public switched telephone circuits to voice over IP (VoIP) servers
which route internal enterprise calls over the enterprises’ IP (Internet protocol)
networks offers new ways to analyze clinical communications at an enterprise scale
[[53], [54]]. VoIP servers direct voice calls internal to the enterprise relying on the publicly-switched
telephone network only when calls leave enterprise sites. Multiple vendors offer enterprise
VoIP systems, often as part of a unified communications system [[55]]. While actual conversations are not routinely recorded for reasons of privacy and
storage costs, each call generates a call detail record (CDR) in a log file [[56], [57]].
This paper describes a framework for mapping telephone calls to clinical areas as
identified by cost-centers using CDR logs and then graphically displaying the high-volume
call patterns. The goal is to build a tool which can help clinical re-engineering
teams identify additional workflow inefficiencies beyond those recalled from memory
by showing the repetitive telephone calls clinicians use in their work. This software
can also be used for future work to measure the impact of resulting workflow redesigns.
3. Methods
3.1 Data Requirements
The two main types of data needed to perform IP telephony-based analysis of clinical
workflows are a set of call detail records (CDR) and information to map phone numbers
in those CDR’s to one or more identifiers of the specific clinical location or activity.
CDR log files are generated by the VoIP telephony or unified communications server.
Each record’s data fields include the phone numbers and IP addresses involved in the
calls, start and stop time stamps, call type classification such as a conference call
or a bridged call, and potential related information such as video conference data
[[56]].
Mapping phone numbers to the clinical environment requires local information. Directory
information should provide information allowing the phone numbers to be mapped to
clinical areas and tasks. The actual workflow analysis is performed on those phone
numbers which represent high call volume clinical areas with individual user data
computationally discarded. Phone numbers could, for example, represent nursing units,
labs, radiology, operating rooms or the emergency department. Enterprise phone numbers
may accumulate over the course of decades and hence span multiple technologies. Larger
institutions can reserve banks of phone numbers to map their phone to internal exchanges
allowing shorter numbers to be dialed (e.g. xxx-xx5–1234 is mapped to 5–1234). Additional
identification options to map clinical numbers include use of cost center codes and
use of Emergency Location Identification Number data which allows emergency responders
to pinpoint 911 over VoIP call locations within buildings. Lightweight Directory Access
Protocol (LDAP) files may help in providing site mapping of phone numbers.
3.2 Log file analysis
A convenience sample of one million consecutive call detail records from one academic
medical center’s Cisco Unified Communications Manager™ version 7.x instance was obtained
as a log file (424 KB in size) [[56], [57]]. The medical center’s information technology group, which administers enterprise
telephony, also provided a manually-maintained Excel™ spreadsheet mapping of phone
numbers which contained active numbers and a mapping to cost center codes. A separate
Excel™ spreadsheet crosswalk of cost center numbers (1004 in total) to the cost center
names was also provided. Use of pagers, personal cell phones and text messaging was
not measured. All data was transmitted and maintained using encrypted storage and
public key cryptography. The research protocol was reviewed by the Ohio State University
IRB who determined that the research did not involve human subjects and did not fall
under requirements for IRB review or exemption.
Software analyzing the call detail records and the telephone number mappings was programmed
using Perl version 5.16. [[58]] Modules to support in-memory debugging, persistence and calculations of memory
structure size were obtained from Perl’s CPAN.org (Data::Dumper, Storable and Devel::Size).
Perl hash functions were used to represent the major data structures as unique key:value
pairs. Programming was performed in Windows 7 and Centos Linux environments on a laptop
personal computer with an Intel Core i7 2.90 GHz CPU and 16 gigabytes of RAM.
Each phone call generated a 94 field comma separated value CDR. For workflow analysis,
each individual phone call’s data needed to be aggregated and indexed to the phone
number allowing collection of cumulative data about all calls between each realized
pair of internal phone numbers. These aggregated data structures also needed to support
the generation of a directed graph where phone numbers are nodes or vertices and phone
calls are arcs or edges.
To minimize memory use reading the log files, each CDR was read one at a time and
parsed into a hash structure temporarily holding the data fields. Selected fields
such as the number calling, the number called, and the duration of the call were then
saved into three core representations of the phone call traffic. These three hashes
had keys, respectively, of the number calling, the number called and a string combined
the number calling and the number called. These hashes stored the total duration of
calls, the number of successful and unsuccessful calls as defined by the metadata
definitions from the Cisco Unified Communications Manager technical manuals, information
about the calling device and, as described below, data about the clinical environment
[[56]]. The calling number hash also contained a nested hash with subkeys for each called
number and that subset’s call characteristics while correspondingly the called number
hash contained a nested hash with subkeys for each calling number. Having nested hashes
provides significant flexibility in structuring further analysis. Specifically having
symmetrically nested hash functions for both calling and called phone numbers allows
retracing all call patterns in either direction. The Perl language provides hashes
where each key has a unique value.
Functions to sort the calls by call counts were built to allow selection of high-volume
call patterns. Additional functions were built to capture aggregate statistics, time
stamps, and the volume of short duration phone calls. Major data structures were optimized
using Perl references which are pointers to locations in memory. Processing one million
calls with debugging code enabled required 6 minutes.
3.3 Building a graph representation
Intuitively, enterprise representations of phone traffic can be represented as network
graphs. With the growth of Facebook™ and LinkedIn™, social network analysis of graphical
data structures is increasing [[59]–[67]]. Network graphs can be transformed into matrices allowing for application of a
wide variety of computational descriptions of the information flows implicit in the
graph [[68]–[70]].
Graphs can be represented as matrices using an adjacency matrix. An adjacency matrix
representation of a graph of phone call patterns is a matrix A where calls from phone number i to phone number j can be summed as individual matrix element aij
. The nested hashes described above can be seen as sparse adjacency matrices where
only positive matrix elements are represented. Of note, since the direction of the
call, i.e. which number is calling and which number is called, is important in this
analysis, the graph representation of the data is directed. Correspondingly, the phone
call adjacency matrix is not symmetrical as it would be with an undirected graph.
In this visualization of the graph representations of the data, selected phone numbers
are represented as nodes while cumulative phone traffic in one direction is represented
as an arc (or arrow) pointing in one direction. For many pairs of phone numbers (nodes),
there were high volumes of phone calls in both directions so the graph representation
included arcs for both call directions. Building a unified graph representation consolidating
information from both the calling and called phone number data structures requires
making choices about how to combine each data structure’s call duration information.
3.4 Mapping clinical workflows
Mapping of enterprise phone numbers to clinical areas and by implication clinical
activities occurring in those areas depends on how an institution documents its organizational
structure. For this analysis phone numbers were identified by cost center. The cost
center mapping was programmed by cross-matching and merging a cost center file holding
the names and descriptions of clinical cost centers with an enterprise list of phone
numbers with their cost center identification into a hash where the phone numbers
served as keys. On reading each CDR, number lookup from this “phone directory” hash
was performed and the phone number’s cost center data incorporated into the calling,
called and call-pair number hashes.
Importantly, the granularity of the cost center data used allowed association of the
phone numbers with individual nursing units, operating room areas, and numerous other
specific departments and area of the medical center. To focus on clinical workflows,
only data from internal 5-digit extensions was carried forward eliminating outside
calls and internal server re-routed calls. Filtering and data cleansing of the historical
phone number assignment file were programmed using regular expressions.
3.5 Graphical display
For this academic medical center there were over 100,000 phone calls a day and approximately
30,000 unique phone numbers necessitating a strategy of examining only high volume
workflows. Gephi© was selected as the open-source visualization tool for the graph representation [[71]]. Traversing the hash structures described above, each phone number and its associated
clinical cost center was mapped into a node for Gephi while the call traffic between
the phone numbers and corresponding count data was mapped as a directed edge. Gephi
allows customizations to the display of nodes and edges to incorporate color mappings
and various weighting functions. Edge and node files written to Gephi were limited
to the 500 most frequent edges (sorted call pairs) for ease of visual inspection.
Filtering functions to select calls by cost center or call group were built.
Because the goal of this work was to display nodes that are meaningfully labeled and
edges which can be directly inspected, indirect graph analysis using tools such as
clustering, graph partitioning, community detection or spectral analysis was not performed.
4. Results
Summary statistics for all calls are listed in ►[Table 1]. Two findings are of note. First, the large volume of phone calls in this single
academic medical center and its affiliated clinics (with approximately 850 inpatient
beds at the time of this analysis) reflects a measurable labor investment in telephony.
Filtering results of the log file to select call traffic in the first 168 hours to
correspond to exactly one calendar week yielded 23,186 hours of call duration as measured
by server time for that week. Averaging over that week’s total of 912,386 calls yielded
an average call volume per day of 130,341 calls. Second, as noted in ►[Table 1], roughly half of the call pairs were short calls averaging 27 seconds length.
Table 1
Summary VoIP Call Statistics
Total VoIP calls
|
1,000,000
|
Successful calls[*]
|
898,316
|
Average duration of successful calls
|
101.54 seconds
|
Unique call pairs[**]
|
533,311
|
‘Short’ call pairs[**] with average call under 60 seconds
|
267,158
|
Average duration of ‘short’ call pairs
|
26.93 seconds
|
Total duration of call time for 1,000,000 calls
|
25,337 hours
|
Total duration of clock time to obtain the 1,000,000 calls
|
181 hours
|
Total call count in one week (7 calendar days, 168 consecutive hours)
|
912,386
|
Total duration of call time in one week (168 consecutive hours)
|
23,186 hours
|
* Successful calls are defined in CDR metadata as calls with duration over 1 second
** Call pair is a unique pair of number x calling number y
This framework capturing CDR data allows network displays aggregated by sets of cost
centers and numbers. ►[Figure 1] displays the overall high frequency call traffic for the top 500 call pairs. There
is one large internal node for voice mail as well as a number of other aggregations
most of which represent inpatient nursing units. ►[Figure 2] displays a drill-down into one representative inpatient nursing unit showing a high
volume of internal to the unit shared phones traffic between the unit clerks, nurses,
and patient care associates.
Fig. 1 Gephi© graphical network display of overall enterprise top 500 call pairs calls out of the
one million calls using the Fruchterman-Reingold force-directed layout algorithm.
The left column and margins of the Gephi screen show tools to select the layout algorithm
as well as to label the nodes (phone numbers) and edges (calls) with information such
as cost center name, call volume or duration. The large node in the right upper quadrant
represents voice mail and many of the smaller sub-networks correspond to large inpatient
nursing units. Here label display is turned off to show the overall structure of high-frequency
telephone traffic.
Fig. 2 Close-up of Gephi© screen capture of sorted high frequency calling patterns involving one representative
inpatient nursing unit cost center. Phone number nodes are labeled with their cost
center label and overall contribution to call volume while edges (calls between specific
phone numbers) are labeled with directional call counts. Other call detail record
data could be selected for display when writing the node and edge data files for import
into Gephi. This figure represents the type of display which would be generated and
shown to start manual discovery of clinical workflows in Lean or Six Sigma programs.
5. Discussion
5.1 Discussion
This paper describes design of a software stack to use internal VoIP telephone log
files to generate a graphical representation of enterprise medical center telephone
traffic labeled with clinical cost center information. The cost center label serves
as the proxy for who is making the call and the types of clinical activities which
may have been discussed during the telephone call. The actual topic of the conversation
is obviously not known since conversations were neither recorded nor analyzed.
The framework is designed to generate visual displays for exploratory data analysis
and to facilitate the manual recall of relevant common clinical workflows by medical
center staff engaged in workflow redesign. The author hypothesizes that graph display
can provide information not otherwise available to teams identifying clinical workflows
for redesign, for example as part of a “Six Sigma” or “Lean” process. While this framework
provides potentially new information to assist in analyzing clinical activities, the
mapping into actual clinical activities is entirely dependent on employee pattern
recognition and insight [[72]]. Given the re-engineering intent of the framework, only internal VoIP calls were
displayed graphically.
Summary statistics of the telephone traffic can help frame the clinical informatics
questions and opportunities. The amount of telephony work by employees is likely significantly
underestimated when measuring only the 23,186 hours of time per week spent on the
server. To the extent that that server time reflects an actual two-way conversation
rather than time spent on hold or in voice mail, employee call time totals should
be doubled. If one were to make the assumption that most time on the phone is spent
talking to another employee rather than leaving voicemail this data would suggest
up to approximately twice this time or ∼40,000 employee hours per week were spent
on the phone. However, the author was not able to parse time spent in actual conversation
from the log files. Importantly, we have no measures of additional time spent searching
for phone numbers or the time costs of related workflow interruptions for either party
over the average of the 130,341 calls per day [[48], [73]].
Short telephone calls (defined here as phone number pairs averaging calls under 60
seconds here 27 seconds on average) may represent a unique opportunity to redesign
workflows since the limited call span suggests that only one substantive fact is being
exchanged once one subtracts time required within the call for “overhead” such as
greetings, caller identification, and maybe patient identification. The author’s clinical
experience suggests many of these short calls are status requests or updates. The
class of short calls with limited data payloads may be particularly amenable to separate
automation or incorporation into EMR workflows.
The author and clinical colleagues are performing future research to use the framework
operationally. Early analysis of the data suggests many of the nursing unit calls
are related to patient requests for food or bathroom assistance when handled by patient
care associates and for pain relief and issues with intravenous lines when handled
by nurses. While this work looks at VoIP logs, EMR generated log files involving communication
or coordination tasks such as orders or nursing activities may also be represented
as hashed data structures and mapped into graphs. Other communication formats such
as RTLS (real time location services) and nurse call systems may be similarly mapped
into graphs.
5.2 Limitations
This is a single academic medical center study representing one VoIP server and telephone
network installation. This medical center provided mobile VoIP phones to many employees
so more data could be obtained from VoIP logs than in environments more reliant on
pagers or personal cell phones. Communications using pagers, email or cell phones
were not captured [[74]–[76]]. Building and analyzing the framework’s data structures requires correct interpretation
of the meaning of what is written out in each of the CDR data fields by the telecommunications
server [[56]]. Only one million sequential calls were analyzed. Analysis of larger call volumes
or analysis of call traffic by time of day may reveal different patterns of calling.
This project uses a pre-existing independent mapping of cost-center data to identify
clinical units at a high level of precision. Use of this approach at other institutions
would require some similar type of site-specific mapping of phone numbers to support
inference of clinical activities. Such mappings might be programmed using geo-spatial
localization data, LDAP files, other directories, mandated emergency call telephone
location files and or manual annotation as well as the cost-center information used
here. The granularity or number of clinical locations and implicitly potential clinical
activities can impact workflow insights.
The network graphs become too cluttered if low volume calls are represented. Significant
workflow insights may be lost when not examining this “long tail” of low volume information
[[77]].
6. Conclusions
Clinical institutions using VoIP communications can parse log files generated by internal
telephone calls to identify high-frequency telephone call patterns. Because the numerals
of the phone number alone do not typically carry information, this analysis requires
identification of a data source or process to allow electronical mapping of the telephone
number to a specific clinical area or activity within the enterprise. Assignment of
a phone number to a specific clinical area then facilitates manual inference about
common clinical activities occurring in that area and possible explanations for high
frequency telephone calls placed or received at that number.
By aggregating this individual call information into enterprise-wide data structures
indexed by calling number, called number and calling-called number pairs, cumulative
statistics can be generated. These hash-indexed data structures can also be used to
compute directed graphs where nodes or vertices represent phone numbers and edges
or arcs represent phone call traffic.
Analysis of one medical center’s VoIP log files suggests that medical care at large
clinical enterprises involves amounts of telephone time equivalent to hundreds of
FTE’s (full-time equivalent employees). A large subset of the phone calls were short,
averaging 27 seconds, with the intriguing possibility that these small and possibly
definable data payloads may be shifted from manual telephone calls to some form of
optimization or automation, for example, by providing new tools for status requests
and updates.
Graphical mapping of high-volume call patterns can offer insights into common clinical
activities which are labor-intensive, highly interruptive, and have limited or no
electronic signature in current clinical software. Visual analysis of telephone traffic
may serve as a starting point for “6 Sigma” and “Lean” re-engineering.
Clinical Relevance Statement
Clinical Relevance Statement
Healthcare system telephone calls represented a major time cost to one large academic
medical center with 130,341 phone calls per day. Clinical workflow re-engineering
targets for Lean and Six Sigma efforts are difficult to identify and providing a graphical
display of high-volume telephone traffic can serve as a visualization tool to assist
in workflow analysis. A large subset of calling number – called number phone call
pairs have an average duration of 27 seconds and the small data transfers implicit
in these short calls may serve as a separate automation or process redesign target.
Questions
1). Converting each entry in a telephone call log file into which of the following
data structures can best assist in cross-correlating information such as the calling
and called phone numbers?
-
an array
-
a hash
-
a union
-
a linked list
Answer: (B) a hash.
Indexing log file data is needed to allow cross-correlations and insights beyond summary
statistics. A hash is the data structure which performs key value pair indexing. The
individual entries are naturally already in an array as one reads the file but that
is not enough to build cross-correlations such as a graph network. Unions and linked
lists are basic data structures which were not needed for this analysis.
2). What type of adjunct data structure should be built or used to perform telephone
log file analysis?
-
a data dictionary
-
a time-stamping tool
-
a directory
-
a communications server
Answer: (C) a directory.
Phone numbers in the log file need to be mapped to some representation of the clinical
enterprise via a directory. In this paper the directory maps to cost centers rather
than individuals as with a classic “reverse” telephone directory where the phone number
is indexed to its human (or corporate) owner. It is important to understand the metadata
represented in a log file but this can be done without building a separate data dictionary.
The log files provide their own time stamps so a separate time-stamping tool is not
required though there will likely be code needed to analyze the temporal implications
of the data. A communications server overseeing the Voice over IP communications traffic
is what generates the log file and does not need to be built separately to do this
analysis.