Subscribe to RSS
DOI: 10.3414/ME15-01-0108
Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery
Fundings This work was supported in part by the Slovenian Research Agency and by the Intramural Research Program of the U.S. National Institutes of Health, National Library of Medicine.Publication History
received:
17 August 2015
accepted in revised form:
19 May 2016
Publication Date:
08 January 2018 (online)

Summary
Objectives:Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts.
Methods:We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future.
Results:Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76),gfollowed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87).
Conclusions:The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.
-
References
- 1 Cheng L, Lin H, Zhou F, Yang Z, Wang J. Enhancing the accuracy of knowledge discovery: A supervised learning method. BMC Bioinformatics 2014; 15: S9.
- 2 Fact Sheet MEDLINE® [Internet].. U.S. National Library of Medicine. [cited 2015 Jan 28]. Available from: http://www.nlm.nih.gov/pubs/factsheets/medline.html
- 3 Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: Enabling integrative biology. Nat Rev Genet 2012; 13: 829-39.
- 4 Fluck J, Hofmann-Apitius M. Text mining for systems biology. Drug Discov Today 2014; 19: 140-4.
- 5 Weeber M, Kors JA, Mons B. Online tools to support literature-based discovery in the life sciences. Brief Bioinform 2005; 6: 277-86.
- 6 Hristovski D, Peterlin B, Mitchell JA, Humphrey SM. Using literature-based discovery to identify disease candidate genes. Int J Med Inform 2005; 74: 289-98.
- 7 Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014; 52: 293-310.
- 8 Andronis C, Sharma A, Virvilis V, Deftereos S, Persidis A. Literature mining, ontologies and information visualization for drug repurposing. Brief Bioinform 2011; 12: 357-68.
- 9 Zhang R, Cairelli MJ, Fiszman M, Rosemblat G, Kilicoglu H, Rindflesch TC. et al. Using semantic predications to uncover drug-drug interactions in clinical data. J Biomed Inform 2014; 49: 134-47.
- 10 van Haagen HHHBM, ’t Hoen PAC, Mons B, Schultes EA. Generic information can retrieve known biological associations: implications for biomedical knowledge discovery. PLoS One 2013; 8: e78665.
- 11 Cohen T, Widdows D, Stephan C, Zinner R, Kim J, Rindflesch T. et al. Predicting high-throughput screening results with scalable literature-based discovery methods. CPT pharmacometrics Syst Pharmacol 2014; 3: e140.
- 12 Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med 1986; 30: 7-18.
- 13 DiGiacomo RA, Kremer JM, Shah DM. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: A double-blind, controlled, prospective study. Am J Med 1989; 86: 158-64.
- 14 Eronen LM, Toivonen HT. Biomine: Predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics 2012; 13: 119.
- 15 Gordon MD, Lindsay RK. Toward discovery support systems: A replication, re-examination, and extension of Swanson’s work on literature-based discovery of a connection between Raynaud’s and fish oil. J Am Soc Inf Sci 1996; 47: 116-28.
- 16 Lindsay RK. Literature-based discovery by lexical statistics. J Am Soc Inf Sci 1999; 50: 574-87.
- 17 Weeber M, Klein H, De Jong-Van Den Berg LTW, Vos R. Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. J Am Soc Inf Sci Technol 2001; 52: 548-57.
- 18 Hristovski D, Stare J, Peterlin B, Dzeroski S. Supporting discovery in medicine by association rule mining in Medline and UMLS. Stud Health Technol Inform 2001; 84: 1344-8.
- 19 Srinivasan P. Text mining: Generating hypotheses from MEDLINE. J Am Soc Inf Sci Technol 2004; 55: 396-413.
- 20 Cameron D, Bodenreider O, Yalamanchili H, Danh T, Vallabhaneni S, Thirunarayan K. et al. A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J Biomed Inform 2013; 46: 238-51.
- 21 Hristovski D, Rindflesch T, Peterlin B. Using literature-based discovery to identify novel therapeutic approaches. Cardiovasc Hematol Agents Med Chem 2013; 11: 14-24.
- 22 Li C, Liakata M, Rebholz-Schuhmann D. Biological network extraction from scientific literature: State of the art and challenges. Brief Bioinform 2014; 15: 856-77.
- 23 Bales ME, Johnson SB. Graph theoretic modeling of large-scale semantic networks. J Biomed Inform 2006; 39: 451-64.
- 24 Cohen KB, Hunter L. Getting started in text mining. PLoS Comput Biol 2008; 4: e20.
- 25 Alako BTF, Veldhoven A, van Baal S, Jelier R, Verhoeven S, Rullmann T. et al. CoPub Mapper: Mining MEDLINE based on search term co-publication. BMC Bioinformatics 2005; 6: 51.
- 26 Hoffmann R, Valencia A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 2005; 21 (Suppl. 02) ii252-8.
- 27 Thomas P, Starlinger J, Vowinkel A, Arzt S, Leser U. GeneView: A comprehensive semantic search engine for PubMed. Nucleic Acids Res 2012; 40: W585-91.
- 28 Lü L, Zhou T. Link prediction in complex networks: A survey. Phys A 2011; 390: 1150-70.
- 29 Yu Q, Long C, Lv Y, Shao H, He P, Duan Z. Predicting co-author relationship in medical co-authorship networks. PLoS One 2014; 9: e101214.
- 30 Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. J Am Soc Inf Sci Technol 2007; 58: 1019-31.
- 31 Wang P, Xu B, Wu Y, Zhou X. Link prediction in social networks: The state-of-the-art. Sci China Inf Sci 2014; 58: 1-38.
- 32 Menon AK, Elkan C. Link prediction via matrix factorization.. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M. editors. Mach. Learn. Knowl. Discov. Databases.. Berlin: Springer; 2011
- 33 Kastrin A, Rindflesch TC, Hristovski D. Link prediction in a MeSH co-occurrence network: Preliminary results.. In: Lovis C, Séroussi B, Hasman A, Pape-Haugaard L, Saka O, Andersen SK. editors. e-Health – Contin. Care.. Amsterdam: IOS Press; 2014. p. 579-83.
- 34 Kastrin A, Rindflesch TC, Hristovski D. Link prediction on the Semantic MEDLINE network: An approach to literature-based discovery.. In: Džeros-ki S, Panov P, Kocev D, Todorovski L. editors. Discov. Sci.. 2014. p. 135-43.
- 35 Medical Subject Headings [Internet].. 2013 Available from: http://www.ncbi.nlm.nih.gov/mesh.
- 36 Newman MEJ. The structure and function of complex networks. SIAM Rev Soc Ind Appl Math 2003; 45: 167-256.
- 37 Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U. Complex networks: Structure and dynamics. Phys Rep 2006; 424: 175-308.
- 38 Bishop CM. Pattern recognition and machine learning.. New York, NY: Springer; 2007
- 39 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software. ACM SIGKDD Explor Newsl ACM; 2009; 11: 10.
- 40 Kastrin A, Rindflesch TC, Hristovski D. Large-scale structure of a network of co-occurring MeSH terms: Statistical analysis of macroscopic properties. PLoS One 2014; 9: e102188.
- 41 E-utilities query #1 [Internet].. 2015 [cited 2015 Dec 1]. Available from: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Schizophrenia[majr:noexp]&mindate=1900&maxdate=2005.
- 42 E-utilities query #2 [Internet].. 2015 [cited 2015 Dec 1]. Available from: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Inflammation[majr:noexp]&mindate=1900&maxdate=2005.
- 43 E-utilities query #3 [Internet].. 2015 [cited 2015 Jan 1]. Available from: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=Schizophrenia[majr:noexp]+Inflammation[majr:noexp]&mindate=1900&maxdate=2005.
- 44 Nawa H, Takei N. Recent progress in animal modeling of immune inflammatory processes in schizophrenia: Implication of specific cytokines. Neurosci Res 2006; 56: 2-13.
- 45 E-utilities query #4 [Internet].. Available from: http://eutils.ncbi.nlm.nih.gov/entrez/eutilsesearch.fcgi?db=pubmed&term=Schizophrenia[majr:noexp]+Inflammation[majr:noexp].
- 46 Kirkpatrick B, Miller BJ. Inflammation and schizophrenia. Schizophr Bull 2013; 39: 1174-9.
- 47 Miller BJ, Culpepper N, Rapaport MH. C-reactive protein levels in schizophrenia: a review and meta-analysis. Clin Schizophr Relat Psychoses 2014; 7: 223-30.
- 48 Sainz J, Mata I, Barrera J, Perez-Iglesias R, Varela I, Arranz MJ. et al. Inflammatory and immune response genes have significantly altered expression in schizophrenia. Mol Psychiatry 2013; 18: 1056-7.
- 49 Meyer U. Anti-inflammatory signaling in schizophrenia. Brain Behav Immun 2011; 25: 1507-18.
- 50 Zhou T, Lü L, Zhang Y-C. Predicting missing links via local information. Eur Phys J B 2009; 71: 623-30.
- 51 Lichtenwalter RN, Lussier JT, Chawla NV. New perspectives and methods in link prediction.. Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. – KDD ’10.. New York, New York, USA: ACM Press; 2010. p. 243.
- 52 Wolpert DH. The lack of a priori distinctions between learning algorithms. [cited 2015 Jul 29]. Available from: http://citeseerx.ist.psu.edu/view-doc/summary?doi=10.1.1.51.9734.
- 53 Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems?. J Mach Learn Res 2014; 15: 3133-81.
- 54 Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003; 36: 462-77.
- 55 Lin W-J, Chen JJ. Class-imbalanced classifiers for high-dimensional data. Brief Bioinform 2013; 14: 13-26.