Appl Clin Inform 2016; 07(01): 69-88
DOI: 10.4338/ACI-2015-08-RA-0107
Research Article
Schattauer GmbH

Application of an Ontology for Characterizing Data Quality for a Secondary Use of EHR Data

Steven G. Johnson
1  University of Minnesota, Institute for Health Informatics
Stuart Speedie
1  University of Minnesota, Institute for Health Informatics
Gyorgy Simon
1  University of Minnesota, Institute for Health Informatics
Vipin Kumar
2  University of Minnesota, Department of Computer Science
Bonnie L. Westra
1  University of Minnesota, Institute for Health Informatics
3  University of Minnesota, School of Nursing
› Author Affiliations
This research was supported by Grant Number 1UL1RR033183 from the National Center for Research Resources (NCRR) of the National Institutes of Health (NIH) to the University of Minnesota Clinical and Translational Science Institute (CTSI). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the CTSI or the NIH. The University of Minnesota CTSI is part of a national Clinical and Translational Science Award (CTSA) consortium created to accelerate laboratory discoveries into treatments for patients.
Further Information

Publication History

received: 28 August 2015

accepted: 29 February 2015

Publication Date:
16 December 2017 (online)



The goal of this study is to apply an ontology based assessment process to electronic health record (EHR) data and determine its usefulness in characterizing data quality for calculating an example eMeasure (CMS178).


The process uses a data quality ontology that references separate data quality, domain and task ontologies to compute measures based on proportions of constraints that are satisfied. These quantities indicate how well the data conforms to the domain and how well it fits the task.


The process was performed on a de-identified 200,000 encounter sample from a hospital EHR. CodingConsistency was poor (44%) but DomainConsistency (97%) and TaskRelevance (95%) were very good. Improvements in the data quality Measures correlated with improvements in the eMeasure.


This approach can encourage the development of new detailed Domain ontologies that can be reused for data quality purposes across different organizations’ EHR data. Automating the data quality assessment process using this method can enable sharing of data quality metrics that may aid in making research results that use EHR data more transparent and reproducible.