Methods Inf Med 2023; 62(01/02): 001-004
DOI: 10.1055/a-2045-8287
Editorial for Focus Theme

High-Quality Data for Health Care and Health Research

Jürgen Stausberg
1   Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University Duisburg-Essen, Duisburg, Germany
Sonja Harkener
1   Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University Duisburg-Essen, Duisburg, Germany
› Author Affiliations
Funding The authors were supported by the German Federal Ministry of Education and Research under contract 01GY1917B.


In the 19th century, Florence Nightingale pointed to the importance of nursing documentation for the care of patients[1] and the necessity of data-based statistics for quality improvement.[2] In the same century, John Snow projected his observations about patients with cholera on a street map, laying the ground for modern epidemiological science.[3] The historical examples demonstrate that proper data are the foundation of relevant information about individuals and of new scientific evidence. In the ideal case of Ackoff's pyramid, information, knowledge, understanding, and wisdom arise from data.[4]

“Data quality,” the heading of this focus theme, first appeared in Medline in 1961 as part of a title or an abstract.[5] Naroll et al critically assessed potential biases due to data quality issues on the conclusions of observational research. Their research confirmed that—outside Europe—women normally give birth in an upright position. It lasted 14 years until the next paper mentioned “data quality” as part of its title or abstract.[6] To rate the reliability of his data, Simonton compared aggregates with aggregates from other researchers, thus ending up with some kind of “representativeness.” It took another 16 years until a publication appeared combining the term “data quality” as part of its title or abstract with the MeSH Major Topic “medical records systems, computerized,”[7] frequently used as a surrogate of electronic medical records. This combination might be regarded as the proxy of data quality studies in health care practice. In the validation study of Payne et al, the immunization tracking system missed 10% of the immunizations recorded parallel on paper.[7] The type of immunization differed in additional 4% of immunizations. Completeness and accuracy are mentioned at a later time as important dimensions of data quality from a conceptual point of view.[8] Moreover, Payne et al addressed the multiple use of data in offering their system for surveillance purposes, for research on adverse outcomes after immunization, and for reminders in daily health care.[7]

Since those early years, data quality have become more and more a focus of research (cf. [Fig. 1]). Having 385 papers with the term “data quality” as part of abstract or title in 2012 indexed in Medline, this number has increased to 903 papers in 2021. The fraction of scientific papers dealing with this topic rose from 1 out of 100,000 papers (1972–1981) to 1 out of 1,000 papers (2012–2021). Also, the relative amount of papers tagged additionally with “medical records systems, computerized” changed from 1% (1982–1991) to 4% (1992–2001 and 2002–2011) and 7% (2012–2021). This might indicate an increasing dependency of a data-driven health care on its sources.

Zoom Image
Fig. 1 Rate of Medline citations with the term data quality (left axis) and the linear trend in the share of citations related to EMRs (right axis).

Publication History

Received: 02 February 2023

Accepted: 23 February 2023

Accepted Manuscript online:
02 March 2023

Article published online:
27 March 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Chelagat D, Sum T, Obel M, Chebor A, Kiptoo R, Bundotich-Mosol P. Documentation: Historical perspectives, purposes, benefits and challenges as faced by nurses. Int J Humanit Soc Sci 2013; 3: 236-240
  • 2 Nightingale F. Notes on Hospitals, 3rd ed. London: Longman, Green, Longman, Roberts, and Green; 1863: 176
  • 3 Buechner JS, Constantine H, Gjelsvik A. John Snow and the Broad Street pump: 150 years of epidemiology. Med Health R I 2004; 87 (10) 314-315
  • 4 Ackoff RL. From data to wisdom. Presidential address to ISGSR, June 1998. J Appl Syst Anal 1989; 16: 3-9
  • 5 Naroll F, Naroll R, Howard FH. Position of women in childbirth. A study in data quality control. Am J Obstet Gynecol 1961; 82: 943-954
  • 6 Simonton DK. Sociocultural context of individual creativity: a transhistorical time-series analysis. J Pers Soc Psychol 1975; 32 (06) 1119-1133
  • 7 Payne T, Kanvik S, Seward R. et al. Development of an immunization tracking system in a large health maintenance organization. Group Health Cooperative of Puget Sound. Proc Annu Symp Comput Appl Med Care 1991; 131-135
  • 8 Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manage Inf Syst 1996; 12: 5-33
  • 9 Mashoufi M, Ayatollahi H, Khorasani-Zavareh D, Talebi Azad Boni T. Data quality in health care: main concepts and assessment methodologies. Methods Inf Med 2023; 62 (1-2): 5-18
  • 10 Quindroit P, Fruchart M, Degoul S. et al. Definition of a practical taxonomy for referencing data quality problems in health care databases. Methods Inf Med 2023; 62 (1-2): 19-30
  • 11 Yusuf KO, Miljukov O, Schoneberg A. et al. Consistency as a data quality measure for German Corona Consensus items mapped from National Pandemic Cohort Network data collections. Methods Inf Med 2023; (e-pub ahead of print) DOI: 10.1055/a-2006-1086.
  • 12 Rau H, Stahl D, Reichel AJ, Bialke M, Bahls T, Hoffmann W. We know what you agreed to, don't we?—Evaluating the quality of paper-based consents forms and their digitalized equivalent using the example of the Baltic Fracture Competence Centre Project. Methods Inf Med 2023; (e-pub ahead of print) DOI: 10.1055/s-0042-1760249.
  • 13 Tute E, Mast M, Wulff A. Targeted data quality analysis for a clinical decision support system for SIRS detection in critically ill pediatric patients. Methods Inf Med 2023; (e-pub ahead of print) DOI: 10.1055/s-0042-1760238.
  • 14 Tahar K, Martin T, Mou Y, Verbuecheln R, Graessner H, Krefting D. Rare diseases in hospital information systems—an interoperable methodology for distributed data quality assessments. Methods Inf Med 2023; (e-pub ahead of print) DOI: 10.1055/s-0042-1760238.
  • 15 Hernadez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions. Methods Inf Med 2023; (e-pub ahead of print) DOI: 10.1055/s-0042-1760247.
  • 16 Smith B, Van Steelandt S, Khojandi A. Evaluating the impact of healthcare data completeness for deep generative models. Methods Inf Med 2023; 62 (1-2): 31-39