CC BY 4.0 · ACI open 2021; 05(02): e94-e103
DOI: 10.1055/s-0041-1735975
Special Section on Informatics Governance

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

Nandini Anantharama
1  Faculty of IT, Monash University, Clayton, Victoria, Australia
Wray Buntine
1  Faculty of IT, Monash University, Clayton, Victoria, Australia
Andrew Nunn
2  Victorian Spinal Cord Service, Austin Health, Heidelberg, Victoria, Australia
› Author Affiliations
Funding None.


Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks.

Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning.

Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts.

Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors.

Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.

Protection of Human and Animal Subjects

The study was performed in compliance with the Austin Health Human Research Ethics Committee Ethical Approval (HREC). Ethics approval obtained from Austin on August 18, 2017, reference no LNR/17/Austin/408.

Authors' Contributions

N.A.: Study design, data collection and formatting, analyses and evaluation, interviews, manuscript preparation, tables, and figures. W.B.: Study design, intellectual input on statistical analyses and modeling, manuscript preparation, and review. A.N.: Study design, interviews, clinical interpretation and validation of results, manuscript preparation, and review.

Supplementary Material

Publication History

Received: 28 January 2021

Accepted: 13 August 2021

Publication Date:
18 October 2021 (online)

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany