Methods Inf Med 2010; 49(04): 337-348
DOI: 10.3414/ME0614
Original Articles
Schattauer GmbH

Integration of Relational and Textual Biomedical Sources

A Pilot Experiment Using a Semi-automated Method for Logical Schema Acquisition
M. García-Remesal
1   Biomedical Informatics Group, Dep. Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
,
V. Maojo
1   Biomedical Informatics Group, Dep. Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
,
H. Billhardt
2   Artificial Intelligence Group, Universidad Rey Juan Carlos, Madrid, Spain
,
J. Crespo
1   Biomedical Informatics Group, Dep. Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain
› Author Affiliations
Further Information

Publication History

received: 11 November 2008

accepted: 11 August 2009

Publication Date:
17 January 2018 (online)

Summary

Objectives: Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources.

Methods: We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries.

Results: A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts’ feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set.

Conclusions: The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused – possibly subject to minor modifications – to integrate differently structured sources.

 
  • References

  • 1 Sander C. Genomic medicine and the future of health care. Science 2001; 287 5460 1977-2178.
  • 2 Maojo V, Kulikowski CA. Bioinformatics and Medical Informatics: Collaborations on the Road to Genomic Medicine?. J Am Med Inform Assoc 2003; 10 (06) 515-522.
  • 3 Knaup P, Ammenwerth E, Brandner R, Brigl B, Fischer G, Garde S, Lang E, Pilgram R, Ruderich F, Singer R, Wolff AC, Haux R, Kulikowski C. Towards Clinical Bioinformatics: Advancing Genomic Medicine with Informatics Methods and Tools. Methods Inf Med 2004; 43 (03) 302-307.
  • 4 Martin-Sanchez F, Maojo V, Lopez-Campos G. Integrating Genomics into Health Information Systems. Methods Inf Med 2002; 41 (01) 25-30.
  • 5 Maojo V, García-Remesal M, Billhardt H, AlonsoCalvo R, Perez-Rey D, Martin-Sanchez F. Designing new Methodologies for Integration Biomedical Information in Clinical Trials. Methods Inf Med 2006; 45 (02) 180-185.
  • 6 Sax U, Schmidt S. Integration of Genomic Data in Electronic Health Records – opportunities and dilemmas. Methods Inf Med 2005; 44 (04) 546-550.
  • 7 INFOGENMED: A virtual laboratory for accessing and integrating genetic and medical information for health applications. EC funded project IST-2001-39013.
  • 8 Pérez-Rey D, Maojo V, García-Remesal M. et al. OntoFusion: Ontology Based Integration of Genomic and Clinical Databases. Comput Biol Med 2006; 36 7–8 712-730.
  • 9 Wiederhold G. Mediators in the Architecture of Future Information Systems. Computer 1992; 25 (03) 38-49.
  • 10 García-Remesal M, Maojo V, Crespo J, Billhardt H. Logical Schema Acquisition from Text-Based Sources for Structured and Non-Structured Biomedical Sources Integration. Proc AMIA Symp 2007 pp 259-263.
  • 11 Sujansky J. Heterogeneous Database Integration in Biomedicine. J Biomed Inform 2001; 34 (04) 285-298.
  • 12 Pyle D. Business Modeling and Data Mining. Morgan-Kauffman; 2003
  • 13 Kersy PJ, Morris L, Hermjakob H, Apweiler R. Integr8: Enhanced Inter-Operability of European Molecular Biology Databases. Methods Inf Med 2003; 42 (02) 154-160.
  • 14 García-Molina H, Hammer J, Ireland K, Papakonstantinou Y, Ullman J, Windorn J. Integrating and Accessing Heterogeneous Information Sources in TSIMMIS. Proceedings of the AAAI Symposyum on Information Gathering 1995 pp 61-64.
  • 15 Mena E, Illarramendi A, Kashyap V, Sheth A. OBSERVER: An approach for query processing in global information systems based on interoperation between pre-existing ontologies. Distrib Parallel Dat 2000; 8 (02) 223-271.
  • 16 Wache H, Scholz T, Stieghahn H, König-Ries B. An integration method for the specification of rule-oriented mediators. Proceedings of the International Symposium on Database Applications in Non-Traditional Environments (EFIS 99), Kühlungsborn, Germany, 1999
  • 17 Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC. Discoverylink: a system for integrated access to life sciences data sources. IBM Syst J 2001; 40 (02) 489-511.
  • 18 Baker PG, Brass A, Bechhofer S, Goble C, Paton N, Stevens R. TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 2000; 16 (02) 184-186.
  • 19 Ben Miled Z, Li N, Bukhres O. BACIIS: Biological and Chemical Information Integration Systems. J Database Manage 2005; 16 (03) 73-85.
  • 20 Kawazoe Y, Ohe K. An Ontology-based Mediator of Clinical Information for Decision Support Systems: a Prototype of a Clinical Alert System for Prescription. Methods Inf Med 2008; 47 (06) 549-559.
  • 21 Alonso-Calvo R, Maojo V, Billhardt H, Martin-Sanchez F, García-Remesal M, Pérez-Rey D. An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 2007; 40 (01) 17-29.
  • 22 Aronson AR. Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. Proc AMIA Symp 2001 pp 17-21.
  • 23 Mason O, Tufis D. Tagging Romanian Texts: a case study for QTAG, a language independent probabilistic tagger. Proceedings of the First International Conference on Language Resources and Evaluation 2000 pp 589-596.
  • 24 http://www.english.bham.ac.uk/staff/omason/software/qtag.html. Last accessed: Jan 2008
  • 25 Woods W. Transition Network Grammars for Natural Language Analysis. Commun ACM 1970; 13 (10) 591-606.
  • 26 Bodenreider O. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Res 2004; 32: D267-D270.
  • 27 http://umlsks.nlm.nih.gov. Last accessed: Jan 2008
  • 28 The Gene Ontology Consortium.. Gene Ontology: tool for the unification of biology. Nature Genet 2000; 25: 25-29.
  • 29 http://www.geneontology.org. Last accessed: Jan 2008
  • 30 Bruford EA, Lush MJ, Wright MW, Sneddon TP, Povey S, Birney E. The HGNC Database in 2008: A Resource for the Human Genome. Nucleic Acids Res 2007. Epub ahead of print.
  • 31 http://www.genenames.org. Last accessed: Jan 2008
  • 32 Rosse C, Mejino JVL. A Reference Ontology for Biomedical Informatics: the Foundational Model of Anatomy. J Biomed Inform 2003; 36: 478-500.
  • 33 http://sig.biostr.washington.edu/projects/fm/AboutFMhtml. Last accessed: Jan 2008
  • 34 Hersh WR, Dickham DH. A comparison of two methods for indexing and retrieval from a full text medical database. Med Decis Making 1993; 13 (03) 220-226.
  • 35 Hearst M. Automatic Acquisition of Hyponyms from Large Text Corpora. Proceedings of the 14th Conference on Computational Linguistics 1992 pp 539-545.
  • 36 Friedman E. Jess in Action: Java Rule-Based Systems. Greenwich, CT: Manning Publications Co.; 2003
  • 37 Forgy CL. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artif Intell 1982; 19 (01) 17-37.
  • 38 Sinclair J. Corpus, Concordance, Collocation. Edinburgh, UK: Oxford University Press; 2000
  • 39 Gosset WS. The Probable Error of a Mean. Biometrika 1908; 6 (01) 1-25.
  • 40 Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003; 36: 462-477.
  • 41 Lussier YA, Borlawski T, Rappaport D, Liu Y, Friedman C. PhenoGO: assigning phenotypic context to Gene Ontology annotations with natural language processing. Pac Symp Bio 2006 pp 64-75.
  • 42 ACGT – Advancing Clinico Genomic Trials on Cancer. EC funded project FP6-2005-IST-026996.
  • 43 Maojo V, Crespo J, de la Calle G, Barreiro J, GarcíaRemesal M. Using web services for linking genomic data to medical information Systems. Methods Inf Med 2007; 46 (04) 484-492.
  • 44 Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Commun ACM 1975; 18 (11) 613-620.