Appl Clin Inform 2019; 10(04): 679-692
DOI: 10.1055/s-0039-1695793
Research Article
Georg Thieme Verlag KG Stuttgart · New York

Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC

Sebastian Mate
1  Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
,
Marvin Kampf
1  Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
,
Wolfgang Rödle
2  Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
,
Stefan Kraus
2  Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
,
Rumyana Proynova
3  Medical Informatics in Translational Oncology, German Cancer Research Center, Heidelberg, Germany
,
Kaisa Silander
4  Genomics and Biobank Unit, Finnish National Institute for Health and Welfare, Helsinki, Finland
,
Lars Ebert
5  Federated Information Systems, German Cancer Research Center, Heidelberg, Germany
,
Martin Lablans
5  Federated Information Systems, German Cancer Research Center, Heidelberg, Germany
,
Christina Schüttler
2  Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
,
Christian Knell
1  Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
,
Niina Eklund
4  Genomics and Biobank Unit, Finnish National Institute for Health and Welfare, Helsinki, Finland
,
Michael Hummel
6  Institute of Pathology, Charité-Universitätsmedizin Berlin, Berlin, Germany
7  Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), Graz, Austria
,
Petr Holub
7  Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), Graz, Austria
,
Hans-Ulrich Prokosch
1  Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
2  Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
› Author Affiliations
Funding The present work has been co-funded by ADOPT BBMRI-ERIC supported by EU Horizon 2020, grant agreement no. 676550. It was performed in (partial) fulfillment of the requirements for obtaining the degree “Dr. rer. biol. hum.” from the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) (SM).
Further Information

Publication History

13 February 2019

12 July 2019

Publication Date:
11 September 2019 (online)

Abstract

Background High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks.

Objectives To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task.

Methods Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application.

Results The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients.

Conclusion A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.

Protection of Human and Animal Subjects

The experiments were performed using anonymized patient data. The authors therefore declare that this study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects.