Summary
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.
Background: Data heterogeneity is one of the critical problems in analysing, reusing, sharing
or linking datasets. Metadata, whilst adding semantic description to data, adds an
additional layer of complexity in the heterogeneity of metadata descriptors themselves.
This can be managed by using a predefined model to extract the metadata, but this
can reduce the richness of the data extracted.
Objectives: to link the South London Stroke Register (SLSR), the London Air Pollution toolkit
(LAP) and the Clinical Practice Research Datalink (CPRD) while transforming data into
the Web Ontology Language (OWL) format.
Methods: We used a four-step transformation approach to prepare meta-descriptions, convert
data, generate and update meta-classes and generate OWL files. We validated the correctness
of the transformed OWL files by issuing queries and assessing results against the
original source data.
Results: We have transformed SLSR LAP and CPRD into OWL format. The linked SLSR and CPRD OWL
file contains 3644 male and 3551 female patients. The linked SLSR and LAP OWL file
shows that there are 17 out of 35 outward postcode areas, where no overlapping data
can support further analysis between SLSR and LAP.
Conclusions: Our approach generated a resultant set of transformed OWL formatted files, which
are in a query-able format to run individual queries, or can be easily converted into
other more suitable formats for further analysis, and the transformation was faithful
with no loss or anomalies. Our results have shown that the proposed method provides
a promising general approach to address data heterogeneity.
Keywords
Informatics - knowledge - semantics - data linkage - OWL ontology