Appl Clin Inform 2017; 08(02): 381-395
DOI: 10.4338/ACI-2016-11-RA-0191
Research Article
Schattauer GmbH

Using telephony data to facilitate discovery of clinical workflows

Donald W. Rucker
1  Departments of Biomedical Informatics and Emergency Medicine, Ohio State University, Columbus, OH, USA
› Author Affiliations
Further Information

Correspondence to

Donald W. Rucker, MD
110 31st Avenue N, #406
Nashville, TN 37203

Publication History

08 November 2016

13 February 2017

Publication Date:
21 December 2017 (online)



Background: Discovery of clinical workflows to target for redesign using methods such as Lean and Six Sigma is difficult. VoIP telephone call pattern analysis may complement direct observation and EMR-based tools in understanding clinical workflows at the enterprise level by allowing visualization of institutional telecommunications activity.

Objective: To build an analytic framework mapping repetitive and high-volume telephone call patterns in a large medical center to their associated clinical units using an enterprise unified communications server log file and to support visualization of specific call patterns using graphical networks.

Methods: Consecutive call detail records from the medical center’s unified communications server were parsed to cross-correlate telephone call patterns and map associated phone numbers to a cost center dictionary. Hashed data structures were built to allow construction of edge and node files representing high volume call patterns for display with an open source graph network tool.

Results: Summary statistics for an analysis of exactly one week’s call detail records at a large academic medical center showed that 912,386 calls were placed with a total duration of 23,186 hours. Approximately half of all calling called number pairs had an average call duration under 60 seconds and of these the average call duration was 27 seconds.

Conclusions: Cross-correlation of phone calls identified by clinical cost center can be used to generate graphical displays of clinical enterprise communications. Many calls are short. The compact data transfers within short calls may serve as automation or re-design targets. The large absolute amount of time medical center employees were engaged in VoIP telecommunications suggests that analysis of telephone call patterns may offer additional insights into core clinical workflows.

Citation: Rucker DW. Using telephony data to facilitate discovery of clinical workflows. Appl Clin Inform 2017; 8: 381–395


1. Background

Many researchers examine clinical workflows using one of two approaches [[1]]. One is direct visualization of clinical activities with human observers taking notes as they watch clinicians go about their tasks [[2]–[9]]. The other is analysis of use patterns of pre-existing enterprise software such as computerized physician order entry or nursing medication administration [[10]–[15]]. Both approaches have limitations. Manual observation is expensive, limiting data collection to narrow pre-selected workflows. Analysis of a facility’s electronic medical record environment necessarily reflects workflows from specific software and hardware configurations rather than elective workflow choices made by clinicians.

Likewise, there are limited ways to identify clinical workflows at the scale of a healthcare system with thousands of employees [[16]–[19]]. Answering the managerial question of how frontline nursing and patient care staff spend their time is not easy [[20]–[25]]. This challenge becomes evident when hospitals embark on “Six Sigma” or “Lean” re-engineering efforts and struggle to discover their current processes by asking their employees to manually identify workflow inefficiencies [[26], [27]]. Instrumented workflow discovery tools such as RFID tagging and saccadic eye-movement tracking have had limited success here [[28]–[30]].


2. Objectives

One promising workflow discovery assistance tool is computational analysis of electronic log files recorded as a byproduct of using software for other tasks. Log files have been used to analyze screen navigation, outlier events in laparoscopic surgery and as targets for machine learning approaches to categorize clinical pathways [[12]–[14], [31]–[33]]. This paper focuses on the analysis of log files generated by making telephone calls on a medical center’s communications server network.

Communication is intimately involved with clinical workflows [[34], [35]]. The adoption of “bring your own device” policies (allowing use of personal cell phones and tablets) in many healthcare workplaces has increased awareness of communications [[36]–[41]]. Identification of lapses in communication during “handoffs” has also generated widespread interest [[42]–[49]]. Telephone calls are interesting to measure because they represent clinical tasks in and of themselves and because these communications may serve as surrogate markers for clinical activities which have no current electronic signature. Telephone calls may also reflect rework when phone calls are placed to address failed processes. Of note if there are frequent very short phone calls, the limited information transfers within these short calls may allow identification of simple tasks which offer opportunities for automation or process redesign, for example by replacing some calls with a workflow engine or other communications technology [[50]–[52]].

The transition from public switched telephone circuits to voice over IP (VoIP) servers which route internal enterprise calls over the enterprises’ IP (Internet protocol) networks offers new ways to analyze clinical communications at an enterprise scale [[53], [54]]. VoIP servers direct voice calls internal to the enterprise relying on the publicly-switched telephone network only when calls leave enterprise sites. Multiple vendors offer enterprise VoIP systems, often as part of a unified communications system [[55]]. While actual conversations are not routinely recorded for reasons of privacy and storage costs, each call generates a call detail record (CDR) in a log file [[56], [57]].

This paper describes a framework for mapping telephone calls to clinical areas as identified by cost-centers using CDR logs and then graphically displaying the high-volume call patterns. The goal is to build a tool which can help clinical re-engineering teams identify additional workflow inefficiencies beyond those recalled from memory by showing the repetitive telephone calls clinicians use in their work. This software can also be used for future work to measure the impact of resulting workflow redesigns.


3. Methods

3.1 Data Requirements

The two main types of data needed to perform IP telephony-based analysis of clinical workflows are a set of call detail records (CDR) and information to map phone numbers in those CDR’s to one or more identifiers of the specific clinical location or activity.

CDR log files are generated by the VoIP telephony or unified communications server. Each record’s data fields include the phone numbers and IP addresses involved in the calls, start and stop time stamps, call type classification such as a conference call or a bridged call, and potential related information such as video conference data [[56]].

Mapping phone numbers to the clinical environment requires local information. Directory information should provide information allowing the phone numbers to be mapped to clinical areas and tasks. The actual workflow analysis is performed on those phone numbers which represent high call volume clinical areas with individual user data computationally discarded. Phone numbers could, for example, represent nursing units, labs, radiology, operating rooms or the emergency department. Enterprise phone numbers may accumulate over the course of decades and hence span multiple technologies. Larger institutions can reserve banks of phone numbers to map their phone to internal exchanges allowing shorter numbers to be dialed (e.g. xxx-xx5–1234 is mapped to 5–1234). Additional identification options to map clinical numbers include use of cost center codes and use of Emergency Location Identification Number data which allows emergency responders to pinpoint 911 over VoIP call locations within buildings. Lightweight Directory Access Protocol (LDAP) files may help in providing site mapping of phone numbers.


3.2 Log file analysis

A convenience sample of one million consecutive call detail records from one academic medical center’s Cisco Unified Communications Manager™ version 7.x instance was obtained as a log file (424 KB in size) [[56], [57]]. The medical center’s information technology group, which administers enterprise telephony, also provided a manually-maintained Excel™ spreadsheet mapping of phone numbers which contained active numbers and a mapping to cost center codes. A separate Excel™ spreadsheet crosswalk of cost center numbers (1004 in total) to the cost center names was also provided. Use of pagers, personal cell phones and text messaging was not measured. All data was transmitted and maintained using encrypted storage and public key cryptography. The research protocol was reviewed by the Ohio State University IRB who determined that the research did not involve human subjects and did not fall under requirements for IRB review or exemption.

Software analyzing the call detail records and the telephone number mappings was programmed using Perl version 5.16. [[58]] Modules to support in-memory debugging, persistence and calculations of memory structure size were obtained from Perl’s (Data::Dumper, Storable and Devel::Size). Perl hash functions were used to represent the major data structures as unique key:value pairs. Programming was performed in Windows 7 and Centos Linux environments on a laptop personal computer with an Intel Core i7 2.90 GHz CPU and 16 gigabytes of RAM.

Each phone call generated a 94 field comma separated value CDR. For workflow analysis, each individual phone call’s data needed to be aggregated and indexed to the phone number allowing collection of cumulative data about all calls between each realized pair of internal phone numbers. These aggregated data structures also needed to support the generation of a directed graph where phone numbers are nodes or vertices and phone calls are arcs or edges.

To minimize memory use reading the log files, each CDR was read one at a time and parsed into a hash structure temporarily holding the data fields. Selected fields such as the number calling, the number called, and the duration of the call were then saved into three core representations of the phone call traffic. These three hashes had keys, respectively, of the number calling, the number called and a string combined the number calling and the number called. These hashes stored the total duration of calls, the number of successful and unsuccessful calls as defined by the metadata definitions from the Cisco Unified Communications Manager technical manuals, information about the calling device and, as described below, data about the clinical environment [[56]]. The calling number hash also contained a nested hash with subkeys for each called number and that subset’s call characteristics while correspondingly the called number hash contained a nested hash with subkeys for each calling number. Having nested hashes provides significant flexibility in structuring further analysis. Specifically having symmetrically nested hash functions for both calling and called phone numbers allows retracing all call patterns in either direction. The Perl language provides hashes where each key has a unique value.

Functions to sort the calls by call counts were built to allow selection of high-volume call patterns. Additional functions were built to capture aggregate statistics, time stamps, and the volume of short duration phone calls. Major data structures were optimized using Perl references which are pointers to locations in memory. Processing one million calls with debugging code enabled required 6 minutes.


3.3 Building a graph representation

Intuitively, enterprise representations of phone traffic can be represented as network graphs. With the growth of Facebook™ and LinkedIn™, social network analysis of graphical data structures is increasing [[59]–[67]]. Network graphs can be transformed into matrices allowing for application of a wide variety of computational descriptions of the information flows implicit in the graph [[68]–[70]].

Graphs can be represented as matrices using an adjacency matrix. An adjacency matrix representation of a graph of phone call patterns is a matrix A where calls from phone number i to phone number j can be summed as individual matrix element aij . The nested hashes described above can be seen as sparse adjacency matrices where only positive matrix elements are represented. Of note, since the direction of the call, i.e. which number is calling and which number is called, is important in this analysis, the graph representation of the data is directed. Correspondingly, the phone call adjacency matrix is not symmetrical as it would be with an undirected graph.

In this visualization of the graph representations of the data, selected phone numbers are represented as nodes while cumulative phone traffic in one direction is represented as an arc (or arrow) pointing in one direction. For many pairs of phone numbers (nodes), there were high volumes of phone calls in both directions so the graph representation included arcs for both call directions. Building a unified graph representation consolidating information from both the calling and called phone number data structures requires making choices about how to combine each data structure’s call duration information.


3.4 Mapping clinical workflows

Mapping of enterprise phone numbers to clinical areas and by implication clinical activities occurring in those areas depends on how an institution documents its organizational structure. For this analysis phone numbers were identified by cost center. The cost center mapping was programmed by cross-matching and merging a cost center file holding the names and descriptions of clinical cost centers with an enterprise list of phone numbers with their cost center identification into a hash where the phone numbers served as keys. On reading each CDR, number lookup from this “phone directory” hash was performed and the phone number’s cost center data incorporated into the calling, called and call-pair number hashes.

Importantly, the granularity of the cost center data used allowed association of the phone numbers with individual nursing units, operating room areas, and numerous other specific departments and area of the medical center. To focus on clinical workflows, only data from internal 5-digit extensions was carried forward eliminating outside calls and internal server re-routed calls. Filtering and data cleansing of the historical phone number assignment file were programmed using regular expressions.


3.5 Graphical display

For this academic medical center there were over 100,000 phone calls a day and approximately 30,000 unique phone numbers necessitating a strategy of examining only high volume workflows. Gephi© was selected as the open-source visualization tool for the graph representation [[71]]. Traversing the hash structures described above, each phone number and its associated clinical cost center was mapped into a node for Gephi while the call traffic between the phone numbers and corresponding count data was mapped as a directed edge. Gephi allows customizations to the display of nodes and edges to incorporate color mappings and various weighting functions. Edge and node files written to Gephi were limited to the 500 most frequent edges (sorted call pairs) for ease of visual inspection. Filtering functions to select calls by cost center or call group were built.

Because the goal of this work was to display nodes that are meaningfully labeled and edges which can be directly inspected, indirect graph analysis using tools such as clustering, graph partitioning, community detection or spectral analysis was not performed.


4. Results

Summary statistics for all calls are listed in ►[Table 1]. Two findings are of note. First, the large volume of phone calls in this single academic medical center and its affiliated clinics (with approximately 850 inpatient beds at the time of this analysis) reflects a measurable labor investment in telephony. Filtering results of the log file to select call traffic in the first 168 hours to correspond to exactly one calendar week yielded 23,186 hours of call duration as measured by server time for that week. Averaging over that week’s total of 912,386 calls yielded an average call volume per day of 130,341 calls. Second, as noted in ►[Table 1], roughly half of the call pairs were short calls averaging 27 seconds length.

Table 1

Summary VoIP Call Statistics

Total VoIP calls


Successful calls[*]


Average duration of successful calls

101.54 seconds

Unique call pairs[**]


‘Short’ call pairs[**] with average call under 60 seconds


Average duration of ‘short’ call pairs

26.93 seconds

Total duration of call time for 1,000,000 calls

25,337 hours

Total duration of clock time to obtain the 1,000,000 calls

181 hours

Total call count in one week (7 calendar days, 168 consecutive hours)


Total duration of call time in one week (168 consecutive hours)

23,186 hours

* Successful calls are defined in CDR metadata as calls with duration over 1 second ** Call pair is a unique pair of number x calling number y

This framework capturing CDR data allows network displays aggregated by sets of cost centers and numbers. ►[Figure 1] displays the overall high frequency call traffic for the top 500 call pairs. There is one large internal node for voice mail as well as a number of other aggregations most of which represent inpatient nursing units. ►[Figure 2] displays a drill-down into one representative inpatient nursing unit showing a high volume of internal to the unit shared phones traffic between the unit clerks, nurses, and patient care associates.

Zoom Image
Fig. 1 Gephi© graphical network display of overall enterprise top 500 call pairs calls out of the one million calls using the Fruchterman-Reingold force-directed layout algorithm. The left column and margins of the Gephi screen show tools to select the layout algorithm as well as to label the nodes (phone numbers) and edges (calls) with information such as cost center name, call volume or duration. The large node in the right upper quadrant represents voice mail and many of the smaller sub-networks correspond to large inpatient nursing units. Here label display is turned off to show the overall structure of high-frequency telephone traffic.
Zoom Image
Fig. 2 Close-up of Gephi© screen capture of sorted high frequency calling patterns involving one representative inpatient nursing unit cost center. Phone number nodes are labeled with their cost center label and overall contribution to call volume while edges (calls between specific phone numbers) are labeled with directional call counts. Other call detail record data could be selected for display when writing the node and edge data files for import into Gephi. This figure represents the type of display which would be generated and shown to start manual discovery of clinical workflows in Lean or Six Sigma programs.


5. Discussion

5.1 Discussion

This paper describes design of a software stack to use internal VoIP telephone log files to generate a graphical representation of enterprise medical center telephone traffic labeled with clinical cost center information. The cost center label serves as the proxy for who is making the call and the types of clinical activities which may have been discussed during the telephone call. The actual topic of the conversation is obviously not known since conversations were neither recorded nor analyzed.

The framework is designed to generate visual displays for exploratory data analysis and to facilitate the manual recall of relevant common clinical workflows by medical center staff engaged in workflow redesign. The author hypothesizes that graph display can provide information not otherwise available to teams identifying clinical workflows for redesign, for example as part of a “Six Sigma” or “Lean” process. While this framework provides potentially new information to assist in analyzing clinical activities, the mapping into actual clinical activities is entirely dependent on employee pattern recognition and insight [[72]]. Given the re-engineering intent of the framework, only internal VoIP calls were displayed graphically.

Summary statistics of the telephone traffic can help frame the clinical informatics questions and opportunities. The amount of telephony work by employees is likely significantly underestimated when measuring only the 23,186 hours of time per week spent on the server. To the extent that that server time reflects an actual two-way conversation rather than time spent on hold or in voice mail, employee call time totals should be doubled. If one were to make the assumption that most time on the phone is spent talking to another employee rather than leaving voicemail this data would suggest up to approximately twice this time or ∼40,000 employee hours per week were spent on the phone. However, the author was not able to parse time spent in actual conversation from the log files. Importantly, we have no measures of additional time spent searching for phone numbers or the time costs of related workflow interruptions for either party over the average of the 130,341 calls per day [[48], [73]].

Short telephone calls (defined here as phone number pairs averaging calls under 60 seconds here 27 seconds on average) may represent a unique opportunity to redesign workflows since the limited call span suggests that only one substantive fact is being exchanged once one subtracts time required within the call for “overhead” such as greetings, caller identification, and maybe patient identification. The author’s clinical experience suggests many of these short calls are status requests or updates. The class of short calls with limited data payloads may be particularly amenable to separate automation or incorporation into EMR workflows.

The author and clinical colleagues are performing future research to use the framework operationally. Early analysis of the data suggests many of the nursing unit calls are related to patient requests for food or bathroom assistance when handled by patient care associates and for pain relief and issues with intravenous lines when handled by nurses. While this work looks at VoIP logs, EMR generated log files involving communication or coordination tasks such as orders or nursing activities may also be represented as hashed data structures and mapped into graphs. Other communication formats such as RTLS (real time location services) and nurse call systems may be similarly mapped into graphs.


5.2 Limitations

This is a single academic medical center study representing one VoIP server and telephone network installation. This medical center provided mobile VoIP phones to many employees so more data could be obtained from VoIP logs than in environments more reliant on pagers or personal cell phones. Communications using pagers, email or cell phones were not captured [[74]–[76]]. Building and analyzing the framework’s data structures requires correct interpretation of the meaning of what is written out in each of the CDR data fields by the telecommunications server [[56]]. Only one million sequential calls were analyzed. Analysis of larger call volumes or analysis of call traffic by time of day may reveal different patterns of calling.

This project uses a pre-existing independent mapping of cost-center data to identify clinical units at a high level of precision. Use of this approach at other institutions would require some similar type of site-specific mapping of phone numbers to support inference of clinical activities. Such mappings might be programmed using geo-spatial localization data, LDAP files, other directories, mandated emergency call telephone location files and or manual annotation as well as the cost-center information used here. The granularity or number of clinical locations and implicitly potential clinical activities can impact workflow insights.

The network graphs become too cluttered if low volume calls are represented. Significant workflow insights may be lost when not examining this “long tail” of low volume information [[77]].


6. Conclusions

Clinical institutions using VoIP communications can parse log files generated by internal telephone calls to identify high-frequency telephone call patterns. Because the numerals of the phone number alone do not typically carry information, this analysis requires identification of a data source or process to allow electronical mapping of the telephone number to a specific clinical area or activity within the enterprise. Assignment of a phone number to a specific clinical area then facilitates manual inference about common clinical activities occurring in that area and possible explanations for high frequency telephone calls placed or received at that number.

By aggregating this individual call information into enterprise-wide data structures indexed by calling number, called number and calling-called number pairs, cumulative statistics can be generated. These hash-indexed data structures can also be used to compute directed graphs where nodes or vertices represent phone numbers and edges or arcs represent phone call traffic.

Analysis of one medical center’s VoIP log files suggests that medical care at large clinical enterprises involves amounts of telephone time equivalent to hundreds of FTE’s (full-time equivalent employees). A large subset of the phone calls were short, averaging 27 seconds, with the intriguing possibility that these small and possibly definable data payloads may be shifted from manual telephone calls to some form of optimization or automation, for example, by providing new tools for status requests and updates.

Graphical mapping of high-volume call patterns can offer insights into common clinical activities which are labor-intensive, highly interruptive, and have limited or no electronic signature in current clinical software. Visual analysis of telephone traffic may serve as a starting point for “6 Sigma” and “Lean” re-engineering.


Clinical Relevance Statement

Healthcare system telephone calls represented a major time cost to one large academic medical center with 130,341 phone calls per day. Clinical workflow re-engineering targets for Lean and Six Sigma efforts are difficult to identify and providing a graphical display of high-volume telephone traffic can serve as a visualization tool to assist in workflow analysis. A large subset of calling number – called number phone call pairs have an average duration of 27 seconds and the small data transfers implicit in these short calls may serve as a separate automation or process redesign target.



1). Converting each entry in a telephone call log file into which of the following data structures can best assist in cross-correlating information such as the calling and called phone numbers?

  • an array

  • a hash

  • a union

  • a linked list

Answer: (B) a hash.

Indexing log file data is needed to allow cross-correlations and insights beyond summary statistics. A hash is the data structure which performs key value pair indexing. The individual entries are naturally already in an array as one reads the file but that is not enough to build cross-correlations such as a graph network. Unions and linked lists are basic data structures which were not needed for this analysis.

2). What type of adjunct data structure should be built or used to perform telephone log file analysis?

  • a data dictionary

  • a time-stamping tool

  • a directory

  • a communications server

Answer: (C) a directory.

Phone numbers in the log file need to be mapped to some representation of the clinical enterprise via a directory. In this paper the directory maps to cost centers rather than individuals as with a classic “reverse” telephone directory where the phone number is indexed to its human (or corporate) owner. It is important to understand the metadata represented in a log file but this can be done without building a separate data dictionary. The log files provide their own time stamps so a separate time-stamping tool is not required though there will likely be code needed to analyze the temporal implications of the data. A communications server overseeing the Voice over IP communications traffic is what generates the log file and does not need to be built separately to do this analysis.


Conflict of Interest

The author declares that he has no conflict of interest.


Dawn McDonald, Mark Carey, Ben Sartor, Steve Holtzapfel and Phyllis Teater of Ohio State University Wexner Medical Center’s IT group provided invaluable assistance obtaining and interpreting the CDR files used.

Clay Marsh and Philip Payne of the Ohio State University Wexner Medical Center provided multiple ideas on structuring research into enterprise communications.

Jacalyn Buck provided valuable information about high call volume clinical domains and nursing unit workflows.

Peter Greco and Jeff Forbes, formerly of Siemens Communications, provided numerous insights into VoIP telephony.

Ethical approval

“All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.”

Correspondence to

Donald W. Rucker, MD
110 31st Avenue N, #406
Nashville, TN 37203

Zoom Image
Fig. 1 Gephi© graphical network display of overall enterprise top 500 call pairs calls out of the one million calls using the Fruchterman-Reingold force-directed layout algorithm. The left column and margins of the Gephi screen show tools to select the layout algorithm as well as to label the nodes (phone numbers) and edges (calls) with information such as cost center name, call volume or duration. The large node in the right upper quadrant represents voice mail and many of the smaller sub-networks correspond to large inpatient nursing units. Here label display is turned off to show the overall structure of high-frequency telephone traffic.
Zoom Image
Fig. 2 Close-up of Gephi© screen capture of sorted high frequency calling patterns involving one representative inpatient nursing unit cost center. Phone number nodes are labeled with their cost center label and overall contribution to call volume while edges (calls between specific phone numbers) are labeled with directional call counts. Other call detail record data could be selected for display when writing the node and edge data files for import into Gephi. This figure represents the type of display which would be generated and shown to start manual discovery of clinical workflows in Lean or Six Sigma programs.