Developing an NLP and IR-based Algorithm for Analyzing Gene-disease Relationships

Y. T. Yen; B. Chen; H. W. Chiu; Y. C. Lee; Y. C. Li; C. Y. Hsu

doi:10.1055/s-0038-1634069

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2006; 45(03): 321-329
DOI: 10.1055/s-0038-1634069

Original Article

Schattauer GmbH

Developing an NLP and IR-based Algorithm for Analyzing Gene-disease Relationships

Authors

Y. T. Yen

¹Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan
B. Chen

²Graduate Institute of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan
H. W. Chiu

¹Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan
Y. C. Lee

¹Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan
Y. C. Li

¹Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan
C. Y. Hsu

¹Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan

Further Information

Publication History

Publication Date:
06 February 2018 (online)

Permissions and Reprints

Summary

Objectives: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently.

Methods: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases.

Results: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We have also built an IR-based retrieval system, “Biomedical Literature Retrieval System (BLRS)“ and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships.

Conclusions: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide an integrated view to discover gene-disease relationships.

Keywords

Natural language processing - information retrieval - gene - disease - relationship - MeSH

References
1 Andrade MA, Valencia A. Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 1998; 14: 600-7.

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Marcotte EM, Xenarios I, Eisenberg D. Mining literature for protein-protein interactions. Bioinformatics 2001; 17: 359-63.

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17: 155-61.

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Chiang JH, Yu HC. Me KE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 2003; 19: 1417-22.

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Jenssen TK. Lægreid A, Komorowski J, Hovig E. A literature network of human genes for highthroughput analysis of gene expression. Nature Genetics 2001; 28: 21-8.

PubMed Search in Google Scholar
Download RIS citation
6 Novichkova S, Egorov S, Daraselia N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 2003; 19: 1699-1706.

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology 1999; 60-7.

Search in Google Scholar
Download RIS citation
8 Tanabe L, Wilbur WJ. Tagging gene and protein names in biomedical text. Bioinformatics 2000; 18: 1124-32.

Search in Google Scholar
Download RIS citation
9 Baeza RY, Ribeiro BN. Modern information retrieval. Addison Wesley Longman. 1999

PubMed Search in Google Scholar
Download RIS citation
10 Chen B, Kuo JW, Tsai WH. Lightly supervised and data-driven approaches to Mandarin broadcast news transcription. The 29th IEEE Int Conf Acoustics, Speech, Signal processing (ICASSP 2004)

Download RIS citation
11 DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 282: 699-705.

Search in Google Scholar
Download RIS citation
12 Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus – a semantically annotated corpus for biotextmining. Bioinformatics 2003; 19: 180-2.

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Yu H, Agichtein E. Extracting synonymous gene and protein terms from biological literature. Bioinformatics 2003; 19: 340-9.

Crossref PubMed Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Developing an NLP and IR-based Algorithm for Analyzing Gene-disease Relationships

Authors

Publication History

Summary

Keywords

References