Editorial for Focus Theme

High-Quality Data for Health Care and Health Research

Jürgen Stausberg
1   Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University Duisburg-Essen, Duisburg, Germany
Sonja Harkener
1   Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University Duisburg-Essen, Duisburg, Germany
Funding The authors were supported by the German Federal Ministry of Education and Research under contract 01GY1917B.


In the 19th century, Florence Nightingale pointed to the importance of nursing documentation for the care of patients[1] and the necessity of data-based statistics for quality improvement.[2] In the same century, John Snow projected his observations about patients with cholera on a street map, laying the ground for modern epidemiological science.[3] The historical examples demonstrate that proper data are the foundation of relevant information about individuals and of new scientific evidence. In the ideal case of Ackoff's pyramid, information, knowledge, understanding, and wisdom arise from data.[4]

“Data quality,” the heading of this focus theme, first appeared in Medline in 1961 as part of a title or an abstract.[5] Naroll et al critically assessed potential biases due to data quality issues on the conclusions of observational research. Their research confirmed that—outside Europe—women normally give birth in an upright position. It lasted 14 years until the next paper mentioned “data quality” as part of its title or abstract.[6] To rate the reliability of his data, Simonton compared aggregates with aggregates from other researchers, thus ending up with some kind of “representativeness.” It took another 16 years until a publication appeared combining the term “data quality” as part of its title or abstract with the MeSH Major Topic “medical records systems, computerized,”[7] frequently used as a surrogate of electronic medical records. This combination might be regarded as the proxy of data quality studies in health care practice. In the validation study of Payne et al, the immunization tracking system missed 10% of the immunizations recorded parallel on paper.[7] The type of immunization differed in additional 4% of immunizations. Completeness and accuracy are mentioned at a later time as important dimensions of data quality from a conceptual point of view.[8] Moreover, Payne et al addressed the multiple use of data in offering their system for surveillance purposes, for research on adverse outcomes after immunization, and for reminders in daily health care.[7]

Since those early years, data quality have become more and more a focus of research (cf. [Fig. 1]). Having 385 papers with the term “data quality” as part of abstract or title in 2012 indexed in Medline, this number has increased to 903 papers in 2021. The fraction of scientific papers dealing with this topic rose from 1 out of 100,000 papers (1972–1981) to 1 out of 1,000 papers (2012–2021). Also, the relative amount of papers tagged additionally with “medical records systems, computerized” changed from 1% (1982–1991) to 4% (1992–2001 and 2002–2011) and 7% (2012–2021). This might indicate an increasing dependency of a data-driven health care on its sources.

Fig. 1 Rate of Medline citations with the term data quality (left axis) and the linear trend in the share of citations related to EMRs (right axis).

