Subscribe to RSS

DOI: 10.1055/a-2006-1018
Rare Diseases in Hospital Information Systems—An Interoperable Methodology for Distributed Data Quality Assessments
Funding This study was done within the “Collaboration on Rare Diseases” of the Medical Informatics Initiative (CORD-MI) funded by the German Federal Ministry of Education and Research (BMBF), funding numbers 01ZZ1911R and 01ZZ1911O.
Abstract
Background Multisite research networks such as the project “Collaboration on Rare Diseases” connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.
Objectives The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.
Methods We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.
Results Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.
Conclusion We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.
Keywords
Data quality - rare disease - distributed analysis - ontology - semantic interoperability - healthcare standardsAuthor Contributions
K.T. and D.K. were responsible for conceptualization, methodology, and writing-original draft. K.T., D.K., T.M., and H.G. were responsible for requirement analysis and definition of indicators. KT was responsible for visualization and software for data quality. K.T., Y.M., and R.V. were responsible for use case execution and tools for data curation. K.T., D.K., Y.M., T.M., H.G., and R.V. were responsible for writing-review and editing. All authors have read and agreed the published version of the manuscript.
on behalf of the Collaboration on Rare Diseases of the Medical Informatics Initiative (CORD-MI)
Publication History
Received: 15 July 2022
Accepted: 10 November 2022
Accepted Manuscript online:
03 January 2023
Article published online:
16 May 2023
© 2023. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Rasmussen LV. The electronic health record for translational research. J Cardiovasc Transl Res 2014; 7 (06) 607-614
- 2 Semler SC, Wissing F, Heyder R. German Medical Informatics Initiative. Methods Inf Med 2018; 57 (S 01): e50-e56
- 3 Winter A, Stäubert S, Ammon D. et al. Smart Medical Information Technology for Healthcare (SMITH). Methods Inf Med 2018; 57 (S 01): e92-e105
- 4 Use Case CORD-MI | Medizininformatik-Initiative. Accessed June 16, 2022 at: https://www.medizininformatik-initiative.de/de/CORD
- 5 Tahar K, Müller C, Dürschmid A. et al. Integrating heterogeneous data sources for cross-institutional data sharing: requirements elicitation and management in SMITH. Stud Health Technol Inform 2019; 264: 1785-1786
- 6 Martin T, Tahar K, Lehne M. et al. Problems of finding rare diseases in the documentation of German hospitals. Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF), 26.-30.09.2021. Düsseldorf:German Medical Science GMS Publishing House;. 2021.
- 7 Spengler H, Gatz I, Kohlmayer F, Kuhn KA, Prasser F. Improving data quality in medical research: a monitoring architecture for clinical and translational data warehouses. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS). . IEEE; 2020: 415-420
- 8 Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc 1997; 4 (05) 342-355
- 9 Johnson SG, Speedie S, Simon G, Kumar V, Westra BL. A data quality ontology for the secondary use of EHR data. AMIA Annu Symp Proc 2015; 2015: 1937-1946
- 10 Tute E, Scheffner I, Marschollek M. A method for interoperable knowledge-based data quality assessment. BMC Med Inform Decis Mak 2021; 21 (01) 93
- 11 Kahn MG, Callahan TJ, Barnard J. et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244
- 12 Schmidt CO, Struckmann S, Enzenbach C. et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol 2021; 21 (01) 63
- 13 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (01) 144-151
- 14 Ramasamy A, Chowdhury S. Big data quality dimensions: a systematic literature review. J Inf Syst Technol Manag 2020; 17
- 15 Zozus MN, Kahn MG, Weiskopf NG. Data quality in clinical research. In: Richesson RL, Andrews JE. eds. Clinical Research Informatics. NY: Springer International Publishing; 2019: 213-248
- 16 McGilvray D. Executing Data Quality Projects. 2008
- 17 Diaz-Garelli JF, Bernstam EV, Lee M, Hwang KO, Rahbar MH, Johnson TR. DataGauge: a practical process for systematically designing and implementing quality assessments of repurposed clinical data. EGEMS (Wash DC) 2019; 7 (01) 32
- 18 RESERVED IUAR. Orphanet. . Accessed May 24, 2022 at: http://www.orpha.net
- 19 Maike S. Prävalenz seltener Erkrankungen in der ambulanten Versorgung in Deutschland im Zeitraum 2008 bis 201. Published online 2008:24.
- 20 The Medical Informatics Initiative's core data set. Accessed March 17, 2022 at: https://www.medizininformatik-initiative.de/en/medical-informatics-initiatives-core-data-set
- 21 Bender D, Sartipi K. HL7 FHIR: an agile and RESTful approach to healthcare information exchange. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE; 2013: 326-331
- 22 Aymé S, Bellet B, Rath A. Rare diseases in ICD11: making rare diseases visible in health information systems through appropriate coding. Orphanet J Rare Dis 2015; 10 (01) 35
- 23 Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum Mutat 2012; 33 (05) 803-808
- 24 BfArM - Alpha-ID-SE. . Accessed May 21, 2022 at: https://www.bfarm.de/EN/Code-systems/Terminologies/Alpha-ID-SE/_node.html
- 25 Kodierung von Seltenen Erkrankungen | BMG. Accessed September 22, 2022 at: https://www.bundesgesundheitsministerium.de/themen/praevention/gesundheitsgefahren/seltene-erkrankungen/kodierung-von-seltenen-erkrankungen.html
- 26 Gesetz zur digitalen Modernisierung von Versorgung und Pflege (Digitale-Versorgung-und-Pflege-Modernisierungs-Gesetz – DVPMG). Bundesgesetzblatt Teil I 2021; (28):1316. Accessed January 19, 2023 at: http://www.bgbl.de/xaver/bgbl/start.xav?startbk=Bundesanzeiger_BGBl&jumpTo=bgbl121s1309.pdf
- 27 Kapsner LA, Kampf MO, Seuchter SA. et al. Moving towards an EHR data quality framework: the MIRACUM approach. Stud Health Technol Inform 2019; 267: 247-253
- 28 Blacketer C, Defalco FJ, Ryan PB, Rijnbeek PR. Increasing trust in real-world evidence through evaluation of observational data quality. J Am Med Inform Assoc 2021; 28 (10) 2251-2257
- 29 Chujai P, Kerdprasop N, Kerdprasop K. On transforming the ER model to ontology using Protégé OWL Tool. IJCTE 2014; 6 (06) 484-489
- 30 Cantone D, Longo C, Nicolosi-Asmundo M, Santamaria DF. Web ontology representation and reasoning via fragments of set theory. In: ten Cate B, Mileo A. eds. Web Reasoning and Rule Systems. Lecture Notes in Computer Science. Springer International Publishing. 2015: 61-76
- 31 Salmon MH. Consistency proofs for applied mathematics. In: Salmon WC. ed. Hans Reichenbach: Logical Empiricist. Synthese Library. Springer Netherlands; 1979: 625-636
- 32 Stein HD, Nadkarni P, Erdos J, Miller PL. Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository. J Am Med Inform Assoc 2000; 7 (01) 42-54
- 33 Nonnemacher M. Datenqualität in Der Medizinischen Forschung: Leitlinie Zum Adaptiven Management von Datenqualität in Kohortenstudien Und Registern. MWV Medizinisch Wissenschaftliche Verlagsgesellschaft mbH & Co. KG. 2020
- 34 Brennan PF, Stead WW. Assessing data quality: from concordance, through correctness and completeness, to valid manipulatable representations. J Am Med Inform Assoc 2000; 7 (01) 106-107
- 35 Snowden A, Martin C, Mathers B, Donnell A. Concordance: a concept analysis. J Adv Nurs 2014; 70 (01) 46-59
- 36 Iyen-Omofoman B, Hubbard RB, Smith CJ. et al. The distribution of lung cancer across sectors of society in the United Kingdom: a study using national primary care data. BMC Public Health 2011; 11: 857
- 37 Tahar K. Data quality tools for the special issue “Quality of Data in Health Research and Health Care” in the “Methods of Information in Medicine (MIM)” Journal. 2022 . Accessed November 6, 2022 at: https://doi.org/21.11101/0000-0007-F0BB-7
- 38 Tahar K, Martin T, Mou Y. et al Distributed Data Quality Assessment Across CORD-MI Consortia. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF), 21.-25.08.2022. Düsseldorf : German Medical Science GMS Publishing House;. 2022
- 39 Medical Informatics Initiative - CORD - ImplementationGuide. . Accessed May 23, 2022 at: https://simplifier.net/guide/medicalinformaticsinitiative-cord-implementationguide?version=current
- 40 Lehne M, Schaaf J, Storf H. et al. Rare Diseases in German University Medicine–A Comparison With National Case Statistics. Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF), 26.-30.09.2021. Düsseldorf:German Medical Science GMS Publishing House;. 2021 . Doi: 10.320521gmds118
- 41 Tahar K. Data Quality Library (dqLib): R package for data quality assessment and reporting. 2021 . Accessed November 6, 2022 at: https://doi.org/21.11101/0000-0007-F6DE-A
- 42 Tahar K, Mou Y, Verbuecheln R. Data sets and tools for data curation using FHIR standard. 2022 . Accessed November 4, 2022, at: https://doi.org/21.11101/0000-0007-F978-A
- 43 Beyan O, Choudhury A. et al. Distributed analytics on sensitive medical data: the personal health train. Data Intelligence. 2020; 2 (1–2): 96-107
- 44 Welten S, Hempel L, Abedi M. et al. Multi-institutional breast cancer detection using a secure on-boarding service for distributed analytics. Appl Sci (Basel) 2022; 12 (09) 4336
- 45 Peschel T, Palm J, Przybilla J, Meineke F. Handling HL7 FHIR Resources in R with Fhircrackr. 2021 . Accessed January 19, 2022 at: https://CRAN.R-project.org/package=fhircrackr
- 46 HAPI FHIR - The Open Source FHIR API for Java. . Accessed June 29, 2022. https://hapifhir.io/
- 47 HAPI FHIR Server of CORD-MI. . Accessed July 5, 2022. https://mii-agiop-cord.life.uni-leipzig.de/
- 48 Huser V, DeFalco FJ, Schuemie M. et al. Multisite evaluation of a data quality tool for patient-level clinical data sets. EGEMS (Wash DC) 2016; 4 (01) 1239
- 49 Gaye A, Marcon Y, Isaeva J. et al. DataSHIELD: taking the analysis to the data, not the data to the analysis. Int J Epidemiol 2014; 43 (06) 1929-1944
- 50 Zhao C, Zhao S, Zhao M. et al. Secure multi-party computation: theory, practice and applications. Inf Sci 2019; 476: 357-372
- 51 Rinaldi E, Thun S. From OpenEHR to FHIR and OMOP data model for microbiology findings. Stud Health Technol Inform 2021; 281: 402-406
- 52 Marteau BL, Zhu Y, Giuste F. et al. Accelerating multi-site health informatics with streamlined data infrastructure using OMOP-on-FHIR. . In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 2022: 4687-4690