Web-Based Bioinformatics Predictors: Recommendations to Assess Lysosomal Cholesterol Trafficking Diseases-Related GenesFunding None.
20. September 2018
07. Mai 2019
05. Juli 2019 (online)
Introduction The growing number of genetic variants of unknown significance (VUS) and availability of several in silico prediction tools make the evaluation of potentially deleterious gene variants challenging.
Materials and Methods We evaluated several programs and software to determine the one that can predict the impact of genetic variants found in lysosomal storage disorders (LSDs) caused by defects in cholesterol trafficking best. We evaluated the sensitivity, specificity, accuracy, precision, and Matthew's correlation coefficient of the most common software.
Results Our findings showed that for exonic variants, only MutPred1 reached 100% accuracy and generated the best predictions (sensitivity and accuracy = 1.00), whereas intronic variants, SROOGLE or Human Splicing Finder (HSF) generated the best predictions (sensitivity = 1.00, and accuracy = 1.00).
Discussion Next-generation sequencing substantially increased the number of detected genetic variants, most of which were considered to be VUS, creating a need for accurate pathogenicity prediction. The focus of the present study is the importance of accurately predicting LSDs, with majority of previously unreported specific mutations.
Conclusion We found that the best prediction tool for the NPC1, NPC2, and LIPA variants was MutPred1 for exonic regions and HSF and SROOGLE for intronic regions and splice sites.
KeywordsNPC1 - NPC2 - LIPA - bioinformatics prediction tools - lysosomal storage disease - cholesterol trafficking
Laura López de Frutos designed and performed the research, analyzed the data and drafted the manuscript. Jorge J. Cebolla and Pilar Irún performed the research and analyzed the data. Ralf Köhler reviewed and revised the manuscript, and Pilar Giraldo designed the research. All authors read and approved the manuscript before submission.
∗ Authors J.J. Cebolla and P. Irún should be regarded as joint second authors.
- 1 Vanier MT. Complex lipid trafficking in Niemann-Pick disease type C. J Inherit Metab Dis 2015; 38 (01) 187-199
- 2 Ramirez CM, Liu B, Aqul A. , et al. Quantitative role of LAL, NPC2, and NPC1 in lysosomal cholesterol processing defined by genetic and pharmacological manipulations. J Lipid Res 2011; 52 (04) 688-698
- 3 Capriotti E, Calabrese R, Fariselli P, Martelli PL, Altman RB, Casadio R. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics 2013; 14 (Suppl. 03) S6
- 4 Disfani FM, Hsu W-L, Mizianty MJ. , et al. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 2012; 28 (12) i75-i83
- 5 Folkman L, Yang Y, Li Z. , et al. DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 2015; 31 (10) 1599-1606
- 6 Zhao H, Yang Y, Lin H. , et al. DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 2013; 14 (03) R23
- 7 Niroula A, Vihinen M. Variation interpretation predictors: principles, types, performance, and choice. Hum Mutat 2016; 37 (06) 579-597
- 8 Ou L, Przybilla MJ, Whitley CB. SAAMP 2.0: an algorithm to predict genotype-phenotype correlation of lysosomal storage diseases. Clin Genet 2018; 93 (05) 1008-1014
- 9 Bendl J, Stourac J, Salanda O. , et al. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLOS Comput Biol 2014; 10 (01) e1003440
- 10 Bendl J, Musil M, Štourač J, Zendulka J, Damborský J, Brezovský J. PredictSNP2: a unified platform for accurately evaluating SNP effects by exploiting the different characteristics of variants in distinct genomic regions. PLOS Comput Biol 2016; 12 (05) e1004962
- 11 Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015; 31 (16) 2745-2747
- 12 Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 2012; 40 (Web Server issue): W452-7
- 13 Shihab HA, Rogers MF, Gough J. , et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 2015; 31 (10) 1536-1543
- 14 Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013; Chapter 7: 20
- 15 Hecht M, Bromberg Y, Rost B. Better prediction of functional effects for sequence variants. BMC Genomics 2015; 16 (08) (Suppl. 08) S1
- 16 López-Ferrando V, Gazzo A, de la Cruz X, Orozco M, Gelpí JL. PMut: a web-based tool for the annotation of pathological variants on proteins, 2017 update. Nucleic Acids Res 2017; 45 (W1): W222-W228
- 17 Li B, Krishnan VG, Mort ME. , et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 2009; 25 (21) 2744-2750
- 18 Pejaver V, Urresti J, Lugo-Martinez J. , et al. MutPred2: inferring the molecular and phenotypic impact of amino acid variants. bioRxiv 2017; 134981
- 19 Sharma R, Kumar S, Tsunoda T, Patil A, Sharma A. Predicting MoRFs in protein sequences using HMM profiles. BMC Bioinformatics 2016; 17 (Suppl. 19) 504
- 20 Zhou H, Yang Y, Shen H-B. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 2017; 33 (06) 843-853
- 21 Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput Biol 1997; 4 (03) 311-323
- 22 Desmet F-O, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009; 37 (09) e67
- 23 Schwartz S, Hall E, Ast G. SROOGLE: webserver for integrative, user-friendly visualization of splicing signals. Nucleic Acids Res 2009; 37 (Web Server issue): W189-92
- 24 Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997; 268 (01) 78-94
- 25 Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouzé P, Brunak S. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res 1996; 24 (17) 3439-3452
- 26 Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort—an interactive splice-site analysis tool. Nucleic Acids Res Web Serv 2007; 35: 285-291
- 27 Mort M, Sterne-Weiler T, Li B. , et al. MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing. Genome Biol 2014; 15 (01) R19
- 28 R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria; 2017 . Available at: https://www.r-project.org/ . Accessed May 28, 2019
- 29 Ripley B, Lapsley M. RODBC: ODBC Database Access [Internet]. 2017 . Available at: https://cran.r-project.org/package=RODBC . Accessed May 28, 2019
- 30 Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics 2005; 21 (20) 3940-3941
- 31 Riera C, Lois S, Domínguez C. , et al. Molecular damage in Fabry disease: characterization and prediction of alpha-galactosidase A pathological mutations. Proteins 2015; 83 (01) 91-104
- 32 de la Campa EÁ, Padilla N, de la Cruz X. Development of pathogenicity predictors specific for variants that do not comply with clinical guidelines for the use of computational evidence. BMC Genomics 2017; 18 (05) (Suppl. 05) 569