Incrementally Transforming Electronic Medical Records into the Observational Medical Outcomes Partnership Common Data Model: A Multidimensional Quality Assurance ApproachFunding This work was supported using resources and facilities at the VA Salt Lake City Health Care System and the VA Informatics and Computing Infrastructure (VINCI), VA HSR RES 13–457.
26 April 2019
08 August 2019
23 October 2019 (online)
Background The development and adoption of health care common data models (CDMs) has addressed some of the logistical challenges of performing research on data generated from disparate health care systems by standardizing data representations and leveraging standardized terminology to express clinical information consistently. However, transforming a data system into a CDM is not a trivial task, and maintaining an operational, enterprise capable CDM that is incrementally updated within a data warehouse is challenging.
Objectives To develop a quality assurance (QA) process and code base to accompany our incremental transformation of the Department of Veterans Affairs Corporate Data Warehouse health care database into the Observational Medical Outcomes Partnership (OMOP) CDM to prevent incremental load errors.
Methods We designed and implemented a multistage QA) approach centered on completeness, value conformance, and relational conformance data-quality elements. For each element we describe key incremental load challenges, our extract, transform, and load (ETL) solution of data to overcome those challenges, and potential impacts of incremental load failure.
Results Completeness and value conformance data-quality elements are most affected by incremental changes to the CDW, while updates to source identifiers impact relational conformance. ETL failures surrounding these elements lead to incomplete and inaccurate capture of clinical concepts as well as data fragmentation across patients, providers, and locations.
Conclusion Development of robust QA processes supporting accurate transformation of OMOP and other CDMs from source data is still in evolution, and opportunities exist to extend the existing QA framework and tools used for incremental ETL QA processes.
Conception and design: K.E.L., S.A.D., M.E.M.; acquisition of data: K.E.L., B.V., A.C., D.P., K.H.; analysis: K.E.L., B.V., A.C., D.P., E.H., K.H.; interpretation: K.E.L., S.A.D., S.L.D., M.E.M.; drafting of manuscript: K.E.L., S.A.D., M.E.M.; and critical revision of the manuscript for important intellectual content: all authors.
Protection of Human and Animal Subjects
The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects. All research was conducted with the approval by the University of Utah Institutional Review Board and the VA Salt Lake City Health Care System Research and Development Committee.
- 1 Voss EA, Makadia R, Matcho A. , et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc 2015; 22 (03) 553-564
- 2 Jörg T, DeBloch S. Towards generating ETL processes for incremental loading. In: ACM International Conference Proceeding Series; September 10–12, 2008; Coimbra, Portugal. pp. 101–110
- 3 Post AR, Ai M, Kalsanka Pai A, Overcash M, Stephens DS. Architecting the data loading process for an i2b2 research data warehouse: full reload versus incremental updating. AMIA Annu Symp Proc 2018; 2017: 1411-1420
- 4 Kahn MG, Brown JS, Chun AT. , et al. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC) 2015; 3 (01) 1052
- 5 Kahn MG, Callahan TJ, Barnard J. , et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244
- 6 Hersh WR, Cimino J, Payne PR. , et al. Recommendations for the use of operational electronic health record data in comparative effectiveness research. EGEMS (Wash DC) 2013; 1 (01) 1018
- 7 Qualls LG, Phillips TA, Hammill BG. , et al. Evaluating foundational data quality in the National Patient-Centered Clinical Research Network (PCORnet®). EGEMS (Wash DC) 2018; 6 (01) 3
- 8 Hripcsak G, Duke JD, Shah NH. , et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574-578
- 9 OHDSI. Available at: https://www.ohdsi.org . Accessed August 30, 2019
- 10 Makadia R, Ryan PB. Transforming the premier perspective hospital database into the Observational Medical Outcomes Partnership (OMOP) common data model. EGEMS (Wash DC) 2014; 2 (01) 1110
- 11 Matcho A, Ryan P, Fife D, Reich C. Fidelity assessment of a clinical practice research datalink conversion to the OMOP common data model. Drug Saf 2014; 37 (11) 945-959
- 12 Schwalm M, Raoul T, Chu D. , et al. PRM59 - Conversion of a French electronic medical record (EMR) database into the Observational Medical Outcomes Partnership common data model. Value Health 2017; 20 (09) A741
- 13 Maier C, Lang L, Storf H. , et al. Towards implementation of OMOP in a German university hospital consortium. Appl Clin Inform 2018; 9 (01) 54-61
- 14 Klann JG, Joss MAH, Embree K, Murphy SN. Data model harmonization for the All Of Us Research Program: transforming i2b2 data into the OMOP common data model. PLoS One 2019; 14 (02) e0212463
- 15 Yoon D, Ahn EK, Park MY. , et al. Conversion and data quality assessment of electronic health record data at a Korean tertiary teaching hospital to a common data model for distributed network research. Healthc Inform Res 2016; 22 (01) 54-58
- 16 You SC, Lee S, Cho SY. , et al. Conversion of national health insurance service-national sample cohort (NHIS-NSC) database into Observational Medical Outcomes Partnership-common data model (OMOP-CDM). Stud Health Technol Inform 2017; 245: 467-470
- 17 FitzHenry F, Resnic FS, Robbins SL. , et al. Creating a common data model for comparative effectiveness with the Observational Medical Outcomes Partnership. Appl Clin Inform 2015; 6 (03) 536-547
- 18 U.S. Department of Veterans Affairs. National center for veterans analysis and statistics [11/13/2017]. Available at: https://www.va.gov/vetdata/Utilization.asp . Accessed August 30, 2019
- 19 Fihn SD, Francis J, Clancy C. , et al. Insights from advanced analytics at the Veterans Health Administration. Health Aff (Millwood) 2014; 33 (07) 1203-1211
- 20 Singh R, Singh S. A description of classification of causes of data quality problems in data warehousing. International Journal of Computer Science Issues. 2010; 7 (03) 41-49
- 21 Rupali G, Singh J. A review of contemporary data quality issues in data warehouse ETL environment. Journal on Today's Ideas – Tomorrow's Technologies 2014; 2 (02) 153-160
- 22 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC) 2017; 5 (01) 14