Summary
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.
Background: The need for complementary access to multiple RDF databases has fostered new lines
of research, but also entailed new challenges due to data representation disparities.
While several approaches for RDF-based database integration have been proposed, those
focused on schema alignment have become the most widely adopted. All state-of-the-art
solutions for aligning RDF-based sources resort to a simple technique inherited from
legacy relational database integration methods. This technique – known as element-to-element
(e2e) mappings – is based on establishing 1:1 mappings between single primitive elements
– e.g. concepts, attributes, relationships, etc. – belonging to the source and target
schemas. However, due to the intrinsic nature of RDF – a representation language based
on defining tuples < subject, predicate, object > –, one may find RDF elements whose
semantics vary dramatically when combined into a view involving other RDF elements
– i.e. they depend on their context. The latter cannot be adequately represented in
the target schema by resorting to the traditional e2e approach. These approaches fail
to properly address this issue without explicitly modifying the target ontology, thus
lacking the required expressiveness for properly reflecting the intended semantics
in the alignment information.
Objectives: To enhance existing RDF schema alignment techniques by providing a mechanism to properly
represent elements with context-dependent semantics, thus enabling users to perform
more expressive alignments, including scenarios that cannot be adequately addressed
by the existing approaches.
Methods: Instead of establishing 1:1 correspondences between single primitive elements of
the schemas, we propose adopting a view-based approach. The latter is targeted at
establishing mapping relationships between RDF subgraphs – that can be regarded as
the equivalent of views in traditional databases –, rather than between single schema
elements. This approach enables users to represent scenarios defined by context-dependent
RDF elements that cannot be properly represented when adopting the currently existing
approaches.
Results: We developed a software tool implementing our view-based strategy. Our tool is currently
being used in the context of the European Commission funded p-medicine project, targeted
at creating a technological framework to integrate clinical and genomic data to facilitate
the development of personalized drugs and therapies for cancer, based on the genetic
profile of the patient. We used our tool to integrate different RDF-based databases
– including different repositories of clinical trials and DICOM images – using the
Health Data Ontology Trunk (HDOT) ontology as the target schema.
Conclusions: The importance of database integration methods and tools in the context of biomedical
research has been widely recognized. Modern research in this area – e.g. identification
of disease biomarkers, or design of personalized therapies – heavily relies on the
availability of a technical framework to enable researchers to uniformly access disparate
repositories. We present a method and a tool that implement a novel alignment method
specifically designed to support and enhance the integration of RDF-based data sources
at schema (metadata) level. This approach provides an increased level of expressiveness
compared to other existing solutions, and allows solving heterogeneity scenarios that
cannot be properly represented using other state-ofthe-art techniques.
Keywords
Ontology alignment - database integration - RDF