CC BY-NC-ND 4.0 · Methods Inf Med 2019; 58(02/03): 086-093
DOI: 10.1055/s-0039-1693685
Original Article
Georg Thieme Verlag KG Stuttgart · New York

A Generic Method and Implementation to Evaluate and Improve Data Quality in Distributed Research Networks

D. Juárez
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   German Cancer Consortium (DKTK), Heidelberg, Germany
,
E.E. Schmidt
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   German Cancer Consortium (DKTK), Heidelberg, Germany
,
S. Stahl-Toyota
3   Medical Informatics in Translational Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
,
F. Ückert
3   Medical Informatics in Translational Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
,
M. Lablans
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   German Cancer Consortium (DKTK), Heidelberg, Germany
› Author Affiliations
Further Information

Publication History

15 February 2019

07 June 2019

Publication Date:
12 September 2019 (online)

Abstract

Background With the increasing personalization of clinical therapies, translational research is evermore dependent on multisite research cooperations to obtain sufficient data and biomaterial. Distributed research networks rely on the availability of high-quality data stored in local databases operated by their member institutions. However, reusing data documented by independent health providers for the purpose of care, rather than research (“secondary use”), reveal a high variability in terms of data formats, as well as poor data quality, across network sites.

Objectives The aim of this work is the provision of a process for the assessment of data quality with regard to completeness and syntactic accuracy across independently operated data warehouses using common definitions stored in a central (network-wide) metadata repository (MDR).

Methods For assessment of data quality across multiple sites, we employ a framework of so-called bridgeheads. These are federated data warehouses, which allow the sites to participate in a research network. A central MDR is used to store the definitions of the commonly agreed data elements and their permissible values.

Results We present the design for a generator of quality reports within a bridgehead, allowing the validation of data in the local data warehouse against a research network's central MDR. A standardized quality report can be produced at each network site, providing a means to compare data quality across sites, as well as to channel feedback to the local data source systems, and local documentation personnel. A reference implementation for this concept has been successfully utilized at 10 sites across the German Cancer Consortium.

Conclusions We have shown that comparable data quality assessment across different partners of a distributed research network is feasible when a central metadata repository is combined with locally installed assessment processes. To achieve this, we designed a quality report and the process for generating such a report. The final step was the implementation in a German research network.

 
  • References

  • 1 Zatloukal K, Stumptner C, Kungl P, Mueller H. Biobanks in personalized medicine. Expert Rev Precis Med Drug Dev 2018; 3 (04) 265-273
  • 2 Ginsburg GS, Phillips KA. Precision medicine: from science to value. Health Aff (Millwood) 2018; 37 (05) 694-701
  • 3 Tsimberidou AM, Ringborg U, Schilsky RL. Strategies to overcome clinical, regulatory, and financial challenges in the implementation of personalized medicine. Am Soc Clin Oncol Educ Book 2013; 118-125
  • 4 Abrahams E, Ginsburg GS, Silver M. The Personalized Medicine Coalition: goals and strategies. Am J Pharmacogenomics 2005; 5 (06) 345-355
  • 5 Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014; 21 (04) 578-582
  • 6 Litton J-E. Launch of an Infrastructure for Health Research: BBMRI-ERIC. Biopreserv Biobank 2018 (e-pub ahead of print). Doi: 10.1089/bio.2018.0027
  • 7 Biobanking And Biomolecular Resources Research Infrastructure—European Research Infrastructure Consortium. Available at: http://www.bbmri-eric.eu/ . Accessed July 11, 2019
  • 8 Durinx C, McEntyre J, Appel R. , et al. Identifying ELIXIR core data resources. F1000 Res 2016; 5 (ELIXIR): 2422-2439
  • 9 ELIXIR. A distributed infrastructure for life-science information. Available at: https://www.elixir-europe.org . Accessed July 11, 2019
  • 10 Wilkinson MD, Dumontier M, Aalbersberg IJJ. , et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3: 160018
  • 11 Daniel C, Choquet R. Clinical research informatics: contributions from 2016. Yearb Med Inform 2017; 26 (01) 209-213
  • 12 Kodra Y, Weinbach J, Posada-de-la-Paz M. , et al. Recommendations for improving the quality of rare disease registries. Int J Environ Res Public Health 2018; 15 (08) E1644
  • 13 Hewitt RE. Biobanking: the foundation of personalized medicine. Curr Opin Oncol 2011; 23 (01) 112-119
  • 14 Batini C, Scannapieco M. Data and Information Quality: Dimensions, Principles and Techniques. Switzerland: Springer International Publishing; 2016
  • 15 Safran C, Bloomrosen M, Hammond WE. , et al. Toward a national framework for the secondary use of health data: An American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007; 14 (01) 1-9
  • 16 Dugas M, Neuhaus P, Meidt A. , et al. Portal of medical data models: Information infrastructure for medical research and healthcare. Database (Oxford) 2016 2016. Doi: 10.1093/database/bav121
  • 17 Kahn MG, Callahan TJ, Barnard J. , et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244
  • 18 Fortier I, Burton PR, Robson PJ. , et al. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol 2010; 39 (05) 1383-1393
  • 19 Murphy SN, Weber G, Mendis M. , et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130
  • 20 i2b2 Cell Messaging: Data Repository (CRC) Cell. Available at: https://www.i2b2.org/software/files/PDF/current/CRC_Messaging.pdf . Accessed October 22, 2018
  • 21 Timbie J, Rudin R, Towe V. , et al. National Patient-Centered Clinical Research Network (PCORnet) Phase I: Final Evaluation Report. Santa Monica, CA: RAND Corporation; 2015
  • 22 Lablans M, Kadioglu D, Mate S, Leb I, Prokosch H-U, Ückert F. [Strategies for biobank networks. Classification of different approaches for locating samples and an outlook on the future within the BBMRI-ERIC]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2016; 59 (03) 373-378
  • 23 Mate S, Kadioglu D, Majeed RW. , et al. Proof-of-concept integration of heterogeneous biobank it infrastructures into a hybrid biobanking network. Stud Health Technol Inform 2017; 243: 100-104
  • 24 Kimball R, Caserta J. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Indianapolis, IN: Wiley Publishing Inc.; 2009
  • 25 Embley DW, Thalheim B. , Eds. Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges. Berlin, Germany: Springer-Verlag; 2011
  • 26 Leser U, Naumann F. Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. 1st ed. Heidelberg: dpunkt-Verlag; 2007
  • 27 Weber GM, Murphy SN, McMurry AJ. , et al. The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc 2009; 16 (05) 624-630
  • 28 i2b2: Informatics for Integrating Biology & Bedside. Data Sharing Network (SHRINE). Available at: https://www.i2b2.org/work/shrine.html . Accessed October 22, 2018
  • 29 Lablans M, Kadioglu D, Muscholl M, Ückert F. Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner’s Data Sovereignty Methods. Inf Med 2015; 54 (04) 346-352
  • 30 Garde S, Knaup P, Hovenga E, Heard S. Towards semantic interoperability for electronic health records. Methods Inf Med 2007; 46 (03) 332-343
  • 31 Kadioglu D. Institutionsübergreifende Nutzung Verteilter Metadata Repositories. [Master Thesis]. Dortmund: Fachhochschule Dortmund; 2013
  • 32 Kadioglu D, Breil B, Knell C. , et al. Samply.MDR - a metadata repository and its application in various research networks. Stud Health Technol Inform 2018; 253: 50-54
  • 33 Kadioglu D, Weingardt P, Ückert F, Wagner T. Samply.MDR–Ein Open-Source-Metadaten-Repository. HEC 2016: Health—Exploring Complexity 2016 Joint Conference of GMDS, DGEpi, IEA-EEF, EFMI. 2016
  • 34 Ulrich H, Kock A-K, Duhm-Harbeck P, Habermann JK, Ingenerf J. Metadata repository for improved data sharing and reuse based on HL7 FHIR. Stud Health Technol Inform 2016; 228: 162-166
  • 35 Guenther A, Nowak I, Pertz J, Sirman G. Qualitätsbewertung von Routinedaten zur Sekundärdatenanalyse in der medizinischen Forschung mdi Forum der Medizin_Dokumentation und Medizin_Informatik 2016;Heft 2
  • 36 Apache POI. Apache POI–the Java API for Microsoft Documents. Available at: https://poi.apache.org/ . Accessed October 22, 2018
  • 37 German Cancer Consortium. Available at: https://dktk.dkfz.de/en/home . Accessed July 11, 2019
  • 38 Lablans M, Schmidt EE, Ückert F. An architecture for translational cancer research as exemplified by the German Cancer Consortium. JCO Clin Cancer Inform 2017; (01) 1-8
  • 39 Couchoud C, Lassalle M, Cornet R, Jager KJ. Renal replacement therapy registries--time for a structured data quality evaluation programme. Nephrol Dial Transplant 2013; 28 (09) 2215-2220
  • 40 Baigent C, Harrell FE, Buyse M, Emberson JR, Altman DG. Ensuring trial validity by data quality assurance and diversification of monitoring methods. Clin Trials 2008; 5 (01) 49-55
  • 41 Venet D, Doffagne E, Burzykowski T. , et al. A statistical approach to central monitoring of data quality in clinical trials. Clin Trials 2012; 9 (06) 705-713
  • 42 Berndt DJ, Fisher JW, Hevner AR, Studnicki J. Healthcare data warehousing and quality assurance. Computer 2001; 34 (12) 56-65 . Available at: https://ieeexplore.ieee.org/document/970578
  • 43 Talend Open Studio. Open source integration software. Available at: https://www.talend.com/products/talend-open-studio . Accessed October 22, 2018
  • 44 Corp IBM. The Role of Data Quality in BI and Performance Management 2008. Available at: ftp://public.dhe.ibm.com/software/data/sw-library/cognos/pdfs/whitepapers/wp_the_role_of_data_quality_in_bi_and_performance_management.pdf . Accessed October 22, 2018
  • 45 IBM Cognos Data Manager. Available at: https://www.ibm.com/support/knowledgecenter/en/SSEP7J_10.1.1/com.ibm.swg.ba.cognos.ug_ds.10.1.1.doc/c_introducingdecisionstream.html . Accessed July 11, 2019
  • 46 Daniel C, Sinaci A, Ouagne D. , et al. Standard-based EHR-enabled applications for clinical research and patient safety: CDISC - IHE QRPH - EHR4CR & SALUS collaboration. AMIA Jt Summits Transl Sci Proc 2014; 2014: 19-25
  • 47 Choquet R, Qouiyd S, Ouagne D. , et al. The information quality triangle: a methodology to assess clinical information quality. Stud Health Technol Inform 2010; 160 (Pt. 1): 699-703
  • 48 DiPiro JT, Chisholm-Burns MA. Fail fast. Am J Pharm Educ 2013; 77 (08) 159
  • 49 Altmann U, Dudeck J. The Giessen tumor documentation system (GTDS)--review and perspectives. Methods Inf Med 2006; 45 (01) 108-115
  • 50 GTDS. Gießener Tumordokumentationssystem. Available at: http://www.med.uni-giessen.de/akkk/gtds/gtdsna1d.htm . Accessed October 22, 2018
  • 51 Universitäts Klinikum Ulm. CREDOS (Cancer Retrieval Evaluation and DOcumentation System). Available at: https://www.uniklinik-ulm.de/comprehensive-cancer-center-ulm-cccu/klinisches-krebsregister/software/credos-tumordokumentation.html . Accessed October 22, 2018
  • 52 Universitäts Klinikum Freiburg. CARAT—die CCCF Anwendung zur Registrierung und Auswertung von Tumordaten. Available at: https://www.uniklinik-freiburg.de/cccf/aerzte-fachleute/krebsregister-it/carat-erfassungssystem.html . Accessed October 22, 2018
  • 53 Kairos GmbH. Centraxx (Official Webseite). Available at: https://www.kairos.de/en/products/centraxx/ . Accessed July 11, 2019
  • 54 Selvage M, Judah S, Jain A. Magic quadrant for data quality tools. Available at: https://www.gartner.com/doc/3818863/magic-quadrant-data-quality-tools . Accessed October 22, 2018
  • 55 Run jobs and publish/export results of analysis in Talend Open Studio. Available at: https://community.talend.com/t5/Data-Quality-Preparation-and/Run-jobs-and-publish-export-results-of-analysis-in-Talend-Open/m-p/7390#M38 . Accessed October 22, 2018
  • 56 Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung: Leitlinie zum adaptiven Management von Datenqualität in Kohortenstudien und Registern. 2., aktualisierte und erweiterte Auflage 2014. Schriftenreihe der TMF–Technologie-und Methodenplattform für die vernetzte medizinische Forschung e.V
  • 57 European Network of Cancer Registries. Recommendations issued by ENCR. Available at: https://www.encr.eu/working-groups-and-recommendations . Accessed October 22, 2018
  • 58 Müller H, Reihs R, Zatloukal K. , et al. State-of-the-Art and Future Challenges in the Integration of Biobank Catalogues. In: Holzinger A, Rocker C, Ziefle M. , eds. State-of-the-Art and Future Challenges in the Integration of Biobank Catalogues: Open problems and future challenges. Vol. 8700. Cham: Springer; 2015: 261-273
  • 59 Deakyne Davies SJ, Grundmeier RW, Campos DA. , et al; Pediatric Emergency Care Applied Research Network. The pediatric emergency care applied research network registry: a multicenter electronic health record registry of pediatric emergency care. Appl Clin Inform 2018; 9 (02) 366-376
  • 60 Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. In: Rodrigues PP. , ed. IEEE 26th International Symposium on Computer-Based Medical Systems (CBMS), 2013 , University of Porto, Portugal. Piscataway, NJ: IEEE 2013:326–331
  • 61 Kern J, Boeker M, Brucker DP. , et al. Engineering a data model for distributed research networks in Oncology based on FHIR 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) 2019