Appl Clin Inform 2016; 07(03): 745-764
DOI: 10.4338/ACI-2016-04-RA-0063
Research Article
Schattauer GmbH

A New Paradigm to Analyze Data Completeness of Patient Data

Ayan Nasir
1  Department of Health Management and Informatics, University of Central Florida
Varadraj Gurupur
1  Department of Health Management and Informatics, University of Central Florida
Xinliang Liu
1  Department of Health Management and Informatics, University of Central Florida
› Author Affiliations
We would like to thank Dr. Thomas Wan for his valuable guidance on this project. We would like to inform the readers that this article is an extended version of the paper published in the Proceedings of SDPS 2015 Annual Conference published by the authors and permission has been granted by the society to publish it in any journal.
Further Information

Publication History

received: 26 April 2016

accepted: 04 July 2016

Publication Date:
19 December 2017 (online)



There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them.


Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations.

Methods: The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis.


The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness.


DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data.

Citation: Nasir A, Gurupur V, Liu X. A new paradigm to analyze data completeness of patient data.