Subscribe to RSS

DOI: 10.3414/ME16-01-0123
mosaicQA - A General Approach to Facilitate Basic Data Quality Assurance for Epidemiological Research
Funding: This research is funded by the German Research Foundation (DFG) as a part of the research grant programme „Information infrastructure for research data” (grant number HO 1937/2-1).Publication History
17 October 2016
06 April 2017
Publication Date:
31 January 2018 (online)

Background: Epidemiological studies are based on a considerable amount of personal, medical and socio-economic data. To answer research questions with reliable results, epidemiological research projects face the challenge of providing high quality data. Consequently, gathered data has to be reviewed continuously during the data collection period.
Objectives: This article describes the development of the mosaicQA-library for non-statistical experts consisting of a set of reusable R functions to provide support for a basic data quality assurance for a wide range of application scenarios in epidemiological research.
Methods: To generate valid quality reports for various scenarios and data sets, a general and flexible development approach was needed. As a first step, a set of quality-related questions, targeting quality aspects on a more general level, was identified. The next step included the design of specific R-scripts to produce proper reports for metric and categorical data. For more flexibility, the third development step focussed on the generalization of the developed R-scripts, e.g. extracting characteristics and parameters. As a last step the generic characteristics of the developed R functionalities and generated reports have been evaluated using different metric and categorical datasets.
Results: The developed mosaicQA-library generates basic data quality reports for multivariate input data. If needed, more detailed results for single-variable data, including definition of units, variables, descriptions, code lists and categories of qualified missings, can easily be produced.
Conclusions: The mosaicQA-library enables researchers to generate reports for various kinds of metric and categorical data without the need for computational or scripting knowledge. At the moment, the library focusses on the data structure quality and supports the assessment of several quality indicators, including frequency, distribution and plausibility of research variables as well as the occurrence of missing and extreme values. To simplify the installation process, mosaicQA has been released as an official R-package.
* These authors contributed equally to this work
- 1 Neugebauer EAM, Icks A, Schrappe M. Memorandum III: Methods for Health Services Research (Part 2). Das Gesundheitswesen 2010; 72 (Suppl. 10) 739-748. doi: 10.1055/s-0030-1262858.
- 2 Schrappe M, Glaeske G, Gottwik M, Kilian R, Pa-padimitriou K, Scheidt-Nave C. et al. Memorandum II for Health Services Research “Conceptual, methodical and structural requirements for Health Service Research” (Memorandum II zur Versorgungsforschung “Konzeptionelle, methodische und strukturelle Voraussetzungen der Versorgungsforschung”). Z ärztl Fortbild Qual Gesundh.wes 2005; 99 (Suppl. 10) 648-51.
- 3 Stausberg J, Nasseh D, Nonnemacher M. Measuring Data Quality: A Review of the Literature between 2005 and 2013. Stud Health Technol Inform 2015; 210: 712-716. doi: 10.3233/978-1-61499512-8-712.
- 4 Nonnemacher M, Nasseh D, Stausberg J. Data quality in medical research - Guideline to adaptive management of data quality in cohort studies and registries (Datenqualität in der medizinischen Forschung - Leitlinie zum adaptiven Management von Datenqualität in Kohortenstudien und Regis-tern). 2nd ed. Berlin: Medizinisch Wissenschaftliche Verlagsgesellschaft;; 2014
- 5 Müller D, Augustin M, Banik N, Baumann W, Bestehorn K, Kieschke J. et al. Memorandum Registry for Health Services Research. Das Gesundheitswesen 2010; 72 (Suppl. 11) 824-839. doi: 10.1055/s-0030-1263132.
- 6 Kowarik A, Meindl B, Templ M. sparkTable: Generating graphical tables for websites and documents with R. The R Journal 2015; 7 (Suppl. 01) 24-37. doi:
- 7 Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform 2015; 16 (Suppl. 02) 280-290. doi: 10.1093/bib/bbu006.
- 8 The R Foundation.. The R Project for Statistical Computing. [Online]. 2015 [cited 2015 Feb 24]. Available from:
- 9 The MOSAIC Project.. MOSAIC Homepage. [Online]. 2016 [cited 2016 Oct 15]. Available from:
- 10 Bialke M, Bahls T, Havemann C, Piegsa J, Weitmann K, Wegner T. et al. MOSAIC. A modular approach to data management in epidemiological studies. Methods Inf Med 2015; 54 (Suppl. 04) 364-371. doi: 10.3414/ME14-01-0133.
- 11 Grabe HJ, Assel H, Bahls T, Dörr M, Endlich K, Endlich N. et al. Cohort profile: Greifswald approach to individualized medicine (GANI_MED). Journal of Translational Medicine 2014; 12: 144. doi: 10.1186/1479-5876-12-144.
- 12 The MOSAIC Project.. Guideline for describing a data dictionary. [Online]. 2017 [cited 2017 Feb 2]. Available from:
- 13 The MOSAIC Project.. CRAN-Repository: mo-saicQA. [Online]. 2016 [cited 2016 Sep 17]. Available from:
- 14 The MOSAIC Project.. R-script library for basic data quality assurance. [Online]. 2015 [cited 2016 Oct 8]. Available from:
- 15 Zalatel M, Kralj M. editors. Methodological Guidelines and Recommendations for Efficient and Rational Governance of Patient Registries. Ljubljana: National Institute of Public Health; 2015
- 16 R Markdown Cheat Sheet [Online]. 2014 [cited 2017 Feb 03]. Available from:
- 17 Bialke M, Schuldt R, Blumentritt A. TMF Dockerbank Workshop - An example for orchestrating docker containers - “MOSAIC Toolbox for Re-seach” (TMF Dockerbank - Container-Orchestrierung am Beispiel der “MOSAIC Toolbox for Research”). [Online]. 2016 [cited 2016 Nov 10]. Available from:
- 18 Athey B, Braxenthaler M, Haas M, Guo Y. tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research. AMIA Summits on Translational Science Proceedings. 2013: 6-8.