Methods Inf Med 2021; 60(01/02): 021-031
DOI: 10.1055/s-0041-1731387
Original Article

MAGICPL: A Generic Process Description Language for Distributed Pseudonymization Scenarios

Galina Tremper
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Torben Brenner
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Florian Stampe
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
,
Andreas Borg
3   Institute of Medical Biostatistics, Epidemiology and Informatics, Johannes Gutenberg-Universität Mainz, Universitätsmedizin, Mainz, Germany
,
Martin Bialke
4   Department Epidemiology of Health Care and Community Health, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
,
David Croft
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Esther Schmidt
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Martin Lablans
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
› Author Affiliations
Funding The MAGIC consortium was supported by the Deutsche Forschungsgemeinschaft (DFG) under grant number LA 3859/1-1.

Abstract

Objectives Pseudonymization is an important aspect of projects dealing with sensitive patient data. Most projects build their own specialized, hard-coded, solutions. However, these overlap in many aspects of their functionality. As any re-implementation binds resources, we would like to propose a solution that facilitates and encourages the reuse of existing components.

Methods We analyzed already-established data protection concepts to gain an insight into their common features and the ways in which their components were linked together. We found that we could represent these pseudonymization processes with a simple descriptive language, which we have called MAGICPL, plus a relatively small set of components. We designed MAGICPL as an XML-based language, to make it human-readable and accessible to nonprogrammers. Additionally, a prototype implementation of the components was written in Java. MAGICPL makes it possible to reference the components using their class names, making it easy to extend or exchange the component set. Furthermore, there is a simple HTTP application programming interface (API) that runs the tasks and allows other systems to communicate with the pseudonymization process.

Results MAGICPL has been used in at least three projects, including the re-implementation of the pseudonymization process of the German Cancer Consortium, clinical data flows in a large-scale translational research network (National Network Genomic Medicine), and for our own institute's pseudonymization service.

Conclusions Putting our solution into productive use at both our own institute and at our partner sites facilitated a reduction in the time and effort required to build pseudonymization pipelines in medical research.

Note

The research reported in this article is of a purely technical nature. Neither human nor animal subjects were involved.




Publication History

Received: 22 September 2020

Accepted: 04 May 2021

Article published online:
05 July 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Busse R, Riesberg A. Health Care Systems in Transition. Germany. Copenhagen: WHO Regional Office for Europe on behalf of the European Observatory on Health Systems and Policies; 2004
  • 2 Berndt DJ, Fisher JW, Hevner AR, Studnicki J. Healthcare data warehousing and quality assurance. Computer 2001; 34 (12) 56-65
  • 3 Saltman RB. Decentralization, re-centralization and future European health policy. Eur J Public Health 2008; 18 (02) 104-106
  • 4 Weichert T. Gesundheitsdatenschutz in vernetzten Zeiten. Wien Klin Mag 2018; 21 (03) 130-135
  • 5 Datenschutz-Grundverordnung. DSGVO; 2018. Accessed May 25, 2021 at: https://dejure.org/gesetze/DSGVO/9.html
  • 6 Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc 1969; 64 (328) 1183-1210
  • 7 Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. Inf Syst 2013; 38 (06) 946-969
  • 8 Faldum A, Pommerening K. An optimal code for patient identifiers. Comput Methods Programs Biomed 2005; 79 (01) 81-88
  • 9 Lablans M, Borg A, Ückert F. A RESTful interface to pseudonymization services in modern web applications. BMC Med Inform Decis Mak 2015; 15: 2
  • 10 Joos S, Nettelbeck DM, Reil-Held A. et al. German Cancer Consortium (DKTK) - a national consortium for translational cancer research. Mol Oncol. 2019; 13 (03) 535-542
  • 11 Prokosch H-U, Acker T, Bernarding J. et al. MIRACUM: Medical Informatics in Research and Care in University Medicine: A Large Data Sharing Network to Enhance Translational Research and Medical Care. Erlangen: Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU); 2018
  • 12 Burkhart M, Wiese B. Deutsches Mukoviszidose-Register – Berichtsband. Accessed May 25, 2021 at: https://www.muko.info/fileadmin/user_upload/angebote/qualitaetsmanagement/register/berichtsbaende/berichtsband_2015.pdf
  • 13 Bernemann I, Kersting M, Prokein J, Hummel M, Klopp N, Illig T. Zentralisierte Biobanken als Grundlage für die medizinische Forschung. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2016; 59 (03) 336-343
  • 14 Bialke M, Penndorf P, Wegner T. et al. A workflow-driven approach to integrate generic software modules in a Trusted Third Party. J Transl Med 2015; 13: 176
  • 15 Geidel L, Bahls T, Hoffmann W. Ein generisches Pseudonymisierungswerkzeug als Modul des Zentralen Datenmanagements medizinischer Forschungsdaten. In: Löffler M, Riedel-Heller S. editors. Abstractband 8th Annual Conference of the German Society for Epidemiology (DGEpi) e.V. and 1st International LIFE Symposium (Abstractband 8. Jahrestagung der Deutschen Gesellschaft für Epidemiologie und 1. Internationales LIFE Symposium). Leipzig; 2013: 245-246
  • 16 Pseudonymverwaltung mit gPAS. Accessed May 25, 2021 at: https://www.toolpool-gesundheitsforschung.de/produkte/gpas
  • 17 Bergh B, Hoffmann W, Lablans M. MAGIC - Mainzelliste, Samply.Auth und der Generische Informed Consent Service als Open-Source-Werkzeuge für Identitäts-, Einwilligungs- und Rechtemanagement in der medizinischen Verbundforschung. Accessed May 25, 2021 at: https://gepris.dfg.de/gepris/projekt/315057496?context=projekt&task=showDetail&id=315057496&
  • 18 Bialke M, Bahls T, Geidel L. et al. MAGIC: once upon a time in consent management-a FHIR® tale. J Transl Med 2018; 16 (01) 256
  • 19 Pommerening K, Drepper J, Helbing K, Ganslandt T. Leitfaden zum Datenschutz in medizinischen Forschungsprojekten: Generische Lösungen der TMF 2.0. Schriftenreihe der TMF - Technologie- und Methodenplattform für die Vernetzte Medizinische Forschung e. V; Bd. 11. Berlin: Medizinisch Wissenschaftliche Verlagsgesellschaft; 2014
  • 20 Telematikplattform – Verbund zur Förderung vernetzter medizinischer Forschung (TMF) e. V. Accessed May 25, 2021 at: https://www.tmf-ev.de/
  • 21 Lablans M, Borg A. Clinical Communication Platform (CCP-IT): Datenschutzkonzept. Accessed May 25, 2021 at: https://dktk.dkfz.de/application/files/9014/6235/8458/Datenschutzkonzept_CCP-IT__10.10.2014.pdf
  • 22 Rytina S. Die digitale Zukunft hat begonnen: Mit “DataThereHouse” wird in Heidelberg ein “Navigationssystem” für Ärzte entwickelt. Accessed May 25, 2021 at: https://deutsch.medscape.com/artikelansicht/4906555
  • 23 Büttner R, Wolf J, Kron A. Nationales Netzwerk Genomische Medizin. Das nationale Netzwerk Genomische Medizin (nNGM) : Modell für eine innovative Diagnostik und Therapie von Lungenkrebs im Spannungsfeld eines öffentlichen Versorgungsauftrages. Pathologe 2019; 40 (03) 276-280
  • 24 Lablans M, Schmidt EE, Ückert F. An architecture for translational cancer research as exemplified by the German Cancer Consortium. JCO Clin Cancer Inform 2018; 2 (02) 1-8
  • 25 Boyd R. Getting Started with OAuth 2.0: Programming Clients for Secure Web API authorization and Authentication. Sebastopol, CA: O'Reilly; 2012
  • 26 Gamma E, Helm R, Johnson R, Vlissides J. Design Patterns: Elements of Reusable Object-Oriented Software 39. Boston, MA: Addison-Wesley; 2011
  • 27 Liang S, Bracha G. Dynamic class loading in the Java virtual machine. Paper presented at: OOPSLA '98 Proceedings of the 13th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and applications. New York, NY, United States 1998; (33) 36-44
  • 28 Lin B, Chen Y, Chen X, Yu Y. Comparison between JSON and XML in Applications Based on AJAX. Paper presented at: 2012 International Conference on Computer Science and Service System. IEEE; 2012 2012 1174-1177
  • 29 Haq ZU, Khan GF, Hussain T. A Comprehensive analysis of XML and JSON web technologies. New Developments in Circuits, Systems, Signal Processing, Communications and Computers. 2013: 102-109
  • 30 Nurseitov N, Paulson M, Reynolds R, Izurieta C. Comparison of JSON and XML data interchange formats: a case study. Paper presented at: Proceedings of the ISCA 22nd International Conference on Computer Applications in Industry and Engineering CAINE 2009. November 4–6, 2009, Hilton San Francisco Fisherman's Wharf, San Francisco, California, United States: 2009: 157-162
  • 31 Khare R, Rifkin A. XML: a door to automated Web applications. IEEE Internet Comput 1997; 1 (04) 78-87
  • 32 REST vs. RPC: what problems are you trying to solve with your APIs? Google Cloud Blog. Accessed May 25, 2021 at: https://cloud.google.com/blog/products/application-development/rest-vs-rpc-what-problems-are-you-trying-to-solve-with-your-apis
  • 33 Richardson L, Ruby S. RESTful Web Services: Web Services for the Real World. Beijing: O'Reilly; 2007
  • 34 Feng X, Shen J, Fan Y. REST: An alternative to RPC for Web services architecture. Paper presented at: 2009 First International Conference on Future Information Networks. October 14–17, 2009, Beijing, China. Piscataway: IEEE; 2009: 7-10
  • 35 Drepper J. PID-Generator. Accessed May 25, 2021 at: https://www.tmf-ev.de/Themen/Projekte/V015_01_PID_Generator.aspx
  • 36 Nitzlnader M, Schreier G. Patient identity management for secondary use of biomedical research data in a distributed computing environment. Stud Health Technol Inform 2014; 198: 211-218
  • 37 Hippisley-Cox J. OpenPseudonymiser. Accessed May 25, 2021 at: https://www.openpseudonymiser.org/
  • 38 Boyle DIR. GRHANITE™. Accessed May 25, 2021 at: https://www.grhanite.com/
  • 39 Ibsen C, Anstey J. Camel in Action. 2nd ed.. Shelter Island, NY: Manning; 2018
  • 40 Workflow and Decision Automation Platform. Camunda BPM. Accessed May 25, 2021 at: https://camunda.com/
  • 41 Cogoluègnes A, Templier T, Gregory G, Bazoud O. Spring batch in action. Shelter Island, NY: Manning; 2012
  • 42 Kai Waehner. When to use Apache Camel?. Accessed July 15, 2019 at: http://www.kai-waehner.de/blog/2011/06/02/when-to-use-apache-camel/
  • 43 Joachim Hackmann. SAP, Adobe, Bosch und Bizagi: Fraunhofer testet acht BPM-Suites. Accessed May 25, 2021 at: https://www.computerwoche.de/a/fraunhofer-testet-acht-bpm-suites,2552844,6