CC BY 4.0 · Methods Inf Med 2023; 62(03/04): 071-089
DOI: 10.1055/a-2006-1018
Original Article for a Focus Theme

Rare Diseases in Hospital Information Systems—An Interoperable Methodology for Distributed Data Quality Assessments

Kais Tahar
1   Department of Medical Informatics, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany
,
Tamara Martin
2   Centre for Rare Diseases, University Hospital Tübingen, Tübingen, Germany
,
Yongli Mou
3   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
,
Raphael Verbuecheln
4   Medical Data Integration Center, University Hospital Tübingen, Tübingen, Germany
,
Holm Graessner
2   Centre for Rare Diseases, University Hospital Tübingen, Tübingen, Germany
,
Dagmar Krefting
1   Department of Medical Informatics, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany
› Author Affiliations
Funding This study was done within the “Collaboration on Rare Diseases” of the Medical Informatics Initiative (CORD-MI) funded by the German Federal Ministry of Education and Research (BMBF), funding numbers 01ZZ1911R and 01ZZ1911O.

Abstract

Background Multisite research networks such as the project “Collaboration on Rare Diseases” connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.

Objectives The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.

Methods We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.

Results Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.

Conclusion We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.

Author Contributions

K.T. and D.K. were responsible for conceptualization, methodology, and writing-original draft. K.T., D.K., T.M., and H.G. were responsible for requirement analysis and definition of indicators. KT was responsible for visualization and software for data quality. K.T., Y.M., and R.V. were responsible for use case execution and tools for data curation. K.T., D.K., Y.M., T.M., H.G., and R.V. were responsible for writing-review and editing. All authors have read and agreed the published version of the manuscript.


on behalf of the Collaboration on Rare Diseases of the Medical Informatics Initiative (CORD-MI)




Publication History

Received: 15 July 2022

Accepted: 10 November 2022

Accepted Manuscript online:
03 January 2023

Article published online:
16 May 2023

© 2023. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany