CC BY-NC-ND 4.0 · Methods Inf Med
DOI: 10.1055/s-0041-1740564
Original Article

A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Sascha Welten
1   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
,
Yongli Mou
1   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
,
Laurenz Neumann
1   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
,
Mehrshad Jaberansary
1   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
,
Yeliz Yediel Ucer
2   Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany
,
Toralf Kirsten
3   Department of Medical Data Science, University Medical Center Leipzig, Leipzig, Germany
,
Stefan Decker
1   Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany
2   Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany
,
Oya Beyan
2   Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany
4   Institute for Medical Informatics, Faculty of Medicine, University Hospital Cologne, University of Cologne, Cologne, Germany
› Author Affiliations
Funding This work was supported by the German Ministry for Research and Education (BMBF) as part of the SMITH consortium (SW, LN, MJ, YUY, TK, SD, and OB, grant no. 01ZZ1803K). This work was conducted jointly by RWTH Aachen University and Fraunhofer FIT as part of the PHT and Go FAIR implementation network, which aims to develop a proof-of-concept information system to address current data reusability challenges occurring in the context of so-called data integration centres that are being established as part of ongoing German Medical Informatics BMBF projects.

Abstract

Background In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest.

Objective We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location.

Methods In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers.

Results We show that our infrastructure enables the training of data models based on distributed data sources.

Conclusion Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.

Supplementary Material



Publication History

Received: 30 March 2021

Accepted: 22 September 2021

Publication Date:
17 January 2022 (online)

© 2022. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany