Methods Inf Med 2015; 54(01): 32-40
DOI: 10.3414/ME13-02-0029
Focus Theme – Original Articles
Schattauer GmbH

Semi Automated Transformation to OWL Formatted Files as an Approach to Data Integration

A Feasibility Study Using Environmental, Disease Register and Primary Care Clinical Data
S. F. Liang
1   NIHR Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London, London, UK
,
A. Taweel
2   Department of Informatics, King’s College London, London, UK
,
S. Miles
2   Department of Informatics, King’s College London, London, UK
,
Y. Kovalchuk
1   NIHR Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London, London, UK
,
A. Spiridou
1   NIHR Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London, London, UK
,
B. Barratt
3   Environmental Research Group, MRC-PHE Centre for Environment and Health, King’s College London, London, UK
,
U. Hoang
4   South London Stroke Register, Division of Health and Social Care Research, King’s College London, London, UK
,
S. Crichton
4   South London Stroke Register, Division of Health and Social Care Research, King’s College London, London, UK
,
B. C. Delaney
1   NIHR Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London, London, UK
,
C. Wolfe
4   South London Stroke Register, Division of Health and Social Care Research, King’s College London, London, UK
› Author Affiliations
Further Information

Publication History

received: 21 June 2013

accepted: 23 April 2014

Publication Date:
22 January 2018 (online)

Summary

Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.

Background: Data heterogeneity is one of the critical problems in analysing, reusing, sharing or linking datasets. Metadata, whilst adding semantic description to data, adds an additional layer of complexity in the heterogeneity of metadata descriptors themselves. This can be managed by using a predefined model to extract the metadata, but this can reduce the richness of the data extracted.

Objectives: to link the South London Stroke Register (SLSR), the London Air Pollution toolkit (LAP) and the Clinical Practice Research Datalink (CPRD) while transforming data into the Web Ontology Language (OWL) format.

Methods: We used a four-step transformation approach to prepare meta-descriptions, convert data, generate and update meta-classes and generate OWL files. We validated the correctness of the transformed OWL files by issuing queries and assessing results against the original source data.

Results: We have transformed SLSR LAP and CPRD into OWL format. The linked SLSR and CPRD OWL file contains 3644 male and 3551 female patients. The linked SLSR and LAP OWL file shows that there are 17 out of 35 outward postcode areas, where no overlapping data can support further analysis between SLSR and LAP.

Conclusions: Our approach generated a resultant set of transformed OWL formatted files, which are in a query-able format to run individual queries, or can be easily converted into other more suitable formats for further analysis, and the transformation was faithful with no loss or anomalies. Our results have shown that the proposed method provides a promising general approach to address data heterogeneity.

 
  • References

  • 1 Rinner C, Janzek-Hawlat S, Sibinovic S, Duftschmid G. Semantic Validation of Standard-based Electronic Health Record Documents with W3C XML Schema. Methods Inf Med 2010; 49 (03) 271-280.
  • 2 Sachdeva S, Bhalla S. Semantic interoperability in standardized electronic health record databases. Journal of Data and Information Quality 2012; 3 (01) 1-37.
  • 3 Taweel A, Speedie S, Tyson G, Tawil ARH, Peterson K, Delaney BC. editors. Service and Model-driven. Dynamic Integration of Health Data. The first international workshop on Managing interoperability and complexity in health systems. Glasgow: 2011
  • 4 Budgen D, Rigby M, Brereton P, Turner M. A data Integration Broker for healthcare ststems. IEEE Computer 2007; 40940: 34-41.
  • 5 Tao C, Pathak J, Welch SR, Bouamrane M-M, Huff SM, Chute CG. editors Toward Semantic Web based Knowledge Representation and Extraction from Electronic Health Records. Managing Interoperability and Complexity in Health Systems (MIXHS’11). Glasgow, Scotland, UK: October 28 2011
  • 6 Barbarito F, Pinciroli F, Mason J, Marceglia S, Mazzola L, Bonacina S. Implementing standards for the interoperability among healthcare providers in the public regionalized Healthcare Information System of the Lombardy Region. Journal of biomedical informatics 2012; 45 (04) 736-745. PubMed PMID: 22285983.
  • 7 Bouamrane M-M. Rector A, Hurrell M. Semi-automatic Generation of a Patient Preoperative Knowledge-Base from a Legacy Clinical Database. OnTheMove (OTM). LNCS 5871. Berlin Heidelberg: Springer-Verlag; 2009: 1224-1237.
  • 8 Rector A, Qamar R, Marley T. Binding Ontologies & Coding systems to Electronic Health Records and Messages. Journal of Applied Ontology 2009; 1: 51-69.
  • 9 Atkinson RW, Anderson HR, Strachan DP, Bland JM, Bremner SA, Ponce de Leon A. Short-term associations between outdoor air pollution and visits to accident and emergency departments in London for respiratory complaints. Eur Respir J 1999; 13 (02) 257-265.
  • 10 Hansell AL, Blangiardo M, Fortunato L, Floud S, Hoogh Kd, Fecht D. et al. Aircraft noise and cardiovascular disease near Heathrow airport in London: small area study. BMJ 2013; 347 f5432 1-10.
  • 11 Stewart JA, Dundas R, Howard RS, Rudd AG, Wolfe CDA. Ethnic differences in incidence of stroke: prospective study with stroke register. BMJ 1999; 318 7189 967-971.
  • 12 Addo J, Bhalla A, Crichton S, Rudd AG, McKevitt C, Wolfe CDA. Provision of acute stroke care and associated factors in a multiethnic population: prospective study with the South London Stroke Register. BMJ 2011; 342: d744.
  • 13 Kelly FJ, Anderson HR, Armstrong B, Atkinson R, Barratt B, Beevers S. et al. The Impact of the Congestion Charging Scheme on Air Quality in London. Part 1. Emissions modelling and analysis of air pollution measurements. Res Rep Health Eff Inst 2011; 155: 5-71.
  • 14 Alexandropoulou K, Vlymen Jv, Reid F, Poullis A, Kang J. Temporal trends of Barrett’s oesophagus and gastro-oesophageal reflux and related oesophageal cancer over a 10-year period in England and Wales and associated proton pump inhibitor and H2RA prescriptions: a GPRD study. Eur J Gastroenterol Hepatol 2013; 25 (01) 15-21.
  • 15 Read JD, Benson JR T. Comprehensive coding. Br J Healthcare Computing 1986; 3: 622-625.
  • 16 Allemang D, Polikoff I. editors TopBraid, a multi-user environment for distributed authoring of ontologies. 3rd International Semantic Web Conference (ISWC 2004). Hiroshima, Japan: Springer Verlag; 2004
  • 17 Kalyanpur A, Parsia B, Sirin E, Grau BC, Hendler J. Swoop: a web ontology editing browser. Journal of Web Semantics 2006; 2 (04) 144-153.
  • 18 Erdman M. editor Ontology engineering and plug-in development with the NeOn Toolkit. 5th Annual European Semantic Web Conference (ESWC 2008). 2008
  • 19 Noy NF, Sintek M, Decker S, Crubézy M, Fergerson RW, Musen MA. Creating Semantic Web Contents with Protégé-2000. IEEE INTELLIGENT SYSTEMS. The Semantic Web. 2001: 60-71.
  • 20 Baader F, Horrocks I, Sattler U. Description logics as ontology languages for the semantic web. Lecture Notes in Artificial Intelligence 2005; 2605: 228-248.
  • 21 Motik B, Shearer R, Horrocks I. Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research 2009; 36: 165-228.