Abstract
Background With the increasing personalization of clinical therapies, translational research
is evermore dependent on multisite research cooperations to obtain sufficient data
and biomaterial. Distributed research networks rely on the availability of high-quality
data stored in local databases operated by their member institutions. However, reusing
data documented by independent health providers for the purpose of care, rather than
research (“secondary use”), reveal a high variability in terms of data formats, as
well as poor data quality, across network sites.
Objectives The aim of this work is the provision of a process for the assessment of data quality
with regard to completeness and syntactic accuracy across independently operated data
warehouses using common definitions stored in a central (network-wide) metadata repository
(MDR).
Methods For assessment of data quality across multiple sites, we employ a framework of so-called
bridgeheads. These are federated data warehouses, which allow the sites to participate
in a research network. A central MDR is used to store the definitions of the commonly
agreed data elements and their permissible values.
Results We present the design for a generator of quality reports within a bridgehead, allowing
the validation of data in the local data warehouse against a research network's central
MDR. A standardized quality report can be produced at each network site, providing
a means to compare data quality across sites, as well as to channel feedback to the
local data source systems, and local documentation personnel. A reference implementation
for this concept has been successfully utilized at 10 sites across the German Cancer
Consortium.
Conclusions We have shown that comparable data quality assessment across different partners of
a distributed research network is feasible when a central metadata repository is combined
with locally installed assessment processes. To achieve this, we designed a quality
report and the process for generating such a report. The final step was the implementation
in a German research network.
Keywords
medical informatics - metadata - data accuracy - translational medical research -
health information interoperability