Thromb Haemost 2015; 114(05): 920-932
DOI: 10.1160/TH15-05-0411
Theme Issue Article
Schattauer GmbH

Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism

Marisa L. R. Cunha
1   Department of Experimental Vascular Medicine, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands
2   Department of Vascular Medicine, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands
,
Joost C. M. Meijers
1   Department of Experimental Vascular Medicine, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands
3   Department of Plasma Proteins, Sanquin Research, Amsterdam, the Netherlands
,
Saskia Middeldorp
2   Department of Vascular Medicine, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands
› Author Affiliations
Further Information

Publication History

Received: 15 May 2015

Accepted after major revision: 26 August 2015

Publication Date:
06 December 2017 (online)

Summary

Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.

 
  • References

  • 1 Heit JA. Thrombophilia: Common Questions on Laboratory Assessment and Management. ASH Educ Progr B 2007; 2007: 127-135.
  • 2 Reitsma PH. et al. Mechanistic view of risk factors for venous thromboembolism. Arterioscler Thromb Vasc Biol 2012; 32: 563-568.
  • 3 Covert MW. et al. Integrating high-throughput and computational data elucidates bacterial networks. Nature 2004; 429: 92-96.
  • 4 Tatonetti NP. et al. Data-Driven Prediction of Drug Effects and Interactions. Sci Transl Med 2012; 4: 125ra31-125ra31.
  • 5 Ananiadou S. et al. Text mining and its potential applications in systems biology. Trends Biotechnol 2006; 24: 571-579.
  • 6 Uetz P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000; 403: 623-627.
  • 7 Ellinghaus D. et al. High-density genotyping study identifies four new susceptibility loci for atopic dermatitis. Nat Genet 2013; 45: 808-812.
  • 8 Tregouet D-A. et al. Common susceptibility alleles are unlikely to contribute as strongly as the FV and ABO loci to VTE risk: results from a GWAS approach. Blood 2009; 113: 5298-5303.
  • 9 Germain M. et al. Genetics of Venous thrombosis: Insights from a new genome wide association study. PLoS One 2011; 6: e25581.
  • 10 Heit J. et al. A genome-wide association study of venous thromboembolism identifies risk variants in chromosomes 1q24.2 and 9q. J Thromb Haemost 2012; 10: 1521-1531.
  • 11 Greliche N. et al. A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis. BMC Med Genet 2013; 14: 36.
  • 12 Tang W. et al. A Genome-Wide Association Study for Venous Thromboembolism: The Extended Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. Genet Epidemiol 2013; 37: 512-521.
  • 13 Germain M. et al. Meta-analysis of 65,734 Individuals Identifies TSPAN15 and SLC44A2 as Two Susceptibility Loci for Venous Thromboembolism. Am J Hum Genet 2015; 4: 532-542.
  • 14 Mason C. et al. Characterising Multi-omic Data in Systems Biology. In: Syst Anal Hum Multigene Disord SE – 2 Springer; New York: 2014. pp. 15-38.
  • 15 Lotta L. et al. Identification of genetic risk variants for deep vein thrombosis by multiplexed next-generation sequencing of 186 hemostatic/pro-inflammatory genes. BMC Med Genomics 2012; 5: 7.
  • 16 Lotta L. et al. Next-generation sequencing study finds an excess of rare, coding single-nucleotide variants of ADAMTS13 in patients with deep vein thrombosis. J Thromb Haemost 2013; 11: 1228-1239.
  • 17 Van Dijk EL. et al. Library preparation methods for next-generation sequencing: Tone down the bias. Exp Cell Res 2014; 1: 12-20.
  • 18 Peng X. et al. TELP, a sensitive and versatile library construction method for next-generation sequencing. Nucleic Acids Res 2014; 43: e35.
  • 19 Faino L, Thomma BPHJ. Get your high-quality low-cost genome sequence. Trends Plant Sci 2014; 19: 288-291.
  • 20 Macaulay IC, Voet T. Single Cell Genomics: Advances and Future Perspectives. PLoS Genet 2014; 10: e1004126.
  • 21 Navin N. et al. Tumour evolution inferred by single-cell sequencing. Nature 2011; 472: 90-94.
  • 22 Van Dijk EL. et al. Ten years of next-generation sequencing technology. Trends Genet 2014; 30: 418-426.
  • 23 Sexton D. Computational Infrastructure and Basic Data Analysis for High-Throughput Sequencing. In: Rodriguez-Ezpeleta N, Hackenberg M, Aransay AM. editors Bioinforma High Throughput Seq SE – 4 Springer; New York: 2012. pp. 55-65.
  • 24 Dillies MA. et al. A comprehensive evaluation of normalisation methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 2013; 14: 671-683.
  • 25 Scholz MB. et al. Next generation sequencing and bioinformatic bottlenecks: The current state of metagenomic data analysis. Curr Opin Biotechnol 2012; 23: 9-15.
  • 26 Ross MG. et al. Characterising and measuring bias in sequence data. Genome Biol BioMed Central Ltd 2013; 14: R51.
  • 27 Pabinger S. et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2014; 15: 256-278.
  • 28 Bao R. et al. Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing. Lib Acad 2014; 13: 67-82.
  • 29 O'Rawe J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 2013; 5: 28.
  • 30 Giannoulatou E. et al. Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie. BMC Bioinformatics 2014; 15: S15.
  • 31 Yu X. et al. How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?. BioData Min 2012; 5: 6.
  • 32 Nagarajan N, Pop M. Sequence assembly demystified. Nat Rev Genet 2013; 14: 157-167.
  • 33 Wajid B, Serpedin E. Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers. Genom Proteom Bioinform 2012; 10: 58-73.
  • 34 Miller JR. et al. Assembly algorithm for Next-Ganeration Sequencing data. Genomics 2010; 95: 315-327.
  • 35 Leggett RM, MacLean D.. Reference-free SNP detection: dealing with the data deluge. BMC Genomics 2014; 15: S10.
  • 36 Korneliussen TS. et al Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data. BMC Bioinformatics 2013; 14: 289.
  • 37 Nielsen R. et al. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 2011; 12: 443-451.
  • 38 Wang Z. et al. The role and challenges of exome sequencing in studies of human diseases. Front Genet 2013; 4: 160.
  • 39 Pabinger S. et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2014; 15: 256-278.
  • 40 Ritchie GR, Flicek P. Computational approaches to interpreting genomic sequence variation. Genome Med 2014; 1-11.
  • 41 Robasky K. et al. The role of replicates for error mitigation in next-generation sequencing. Nat Rev Genet 2014; 15: 56-62.
  • 42 Freimer N, Sabatti C. The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nat Genet 2004; 36: 1045-1051.
  • 43 Moltke I. et al. A method for detecting IBD regions simultaneously in multiple individuals – With applications to disease genetics. Genome Res 2011; 21: 1168-1180.
  • 44 Teare MD, Santibanez Koref MF. Linkage analysis and the study of Mendelian disease in the era of whole exome and genome sequencing. Brief Funct Genomics 2014; 13: 378-383.
  • 45 Browning SR, Browning BL. Identity by Descent Between Distant Relatives: Detection and Applications. Ann Rev Genet 2011; 46: 120920150949000.
  • 46 Thompson E. Identity by descent: Variation in meiosis, across genomes, and in populations. Genetics 2013; 194: 301-326.
  • 47 Zheng C. et al. Joint inference of identity by descent along multiple chromosomes from population samples. J Comput Biol 2014; 21: 185-200.
  • 48 Rodriguez JM. et al. Parente2 : A fast and accurate method for detecting identity by descent. Genome Res 2015; 2: 280-289.
  • 49 Browning SR, Thompson E. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 2012; 190: 1521-1531.
  • 50 Smith KR. et al. Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biol 2011; 12: R85.
  • 51 Sverdlov S, Thompson EA. Correlation between Relatives given Complete Genotypes: from Identity by Descent to Identity by Function. Theor Popul Biol 2013; 88: 57-67.
  • 52 Wu MC. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet The American Society of Human Genetics 2011; 89: 82-93.
  • 53 Stitziel NO. et al. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 2011; 12: 227.
  • 54 Sung YJ. et al. Methods for collapsing multiple rare variants in whole-genome sequence data. Genet Epidemiol 2014; 38: 513-520.
  • 55 Stenson P. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet Springer Berlin Heidelberg 2014; 133: 1-9.
  • 56 Eppig JT. et al. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res 2015; 43: D726-736.
  • 57 Welter D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014; 42: D1001-1006.
  • 58 Shyr C. et al. FLAGS, frequently mutated genes in public exomes. BMC Med Genomics 2014; 7: 1-14.
  • 59 Kassahn KS. et al. Integrating Massively Parallel Sequencing into Diagnostic Workflows and Managing the Annotation and Clinical Interpretation Challenge. Hum Mutat 2014; 35: 413-423.
  • 60 Robinson PN. et al. Improved exome prioritisation of disease genes through cross-species phenotype comparison. Genome Res 2014; 24: 340-348.
  • 61 Kircher M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014; 46: 310-315.
  • 62 Ritchie GRS. et al. Functional annotation of noncoding sequence variants. Nat Meth 2014; 11: 294-296.
  • 63 Shihab HA. et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinform 2015; 31: 1536-1543.
  • 64 De Stefano V, Rossi E. Testing for inherited thrombophilia and consequences for antithrombotic prophylaxis in patients with venous thromboembolism and their relatives. Thromb Haemost 2013; 110: 697-705.
  • 65 Howe K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 2013; 496: 498-503.
  • 66 MacArthur DG. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 2014; 508: 469-476.
  • 67 Marian AJ, Belmont J. Strategic approaches to unraveling genetic causes of cardiovascular diseases. Circ Res 2011; 108: 1252-69.
  • 68 Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol 2001; 60: 227-237.
  • 69 Cheung CYK. et al. A statistical framework to guide sequencing choices in pedigrees. Am J Hum Genet 2014; 94: 257-267.
  • 70 Martinelli I. Unusual forms of venous thrombosis and thrombophilia. Pathophysiol Haemost Thromb 2002; 32: 343-345.
  • 71 Vandenbroucke J. et al. Increased risk of venous thrombosis in oral-contraceptive users who are carriers of factor V Leiden mutation. Lancet 1994; 344: 1453-1457.
  • 72 Zöller B. et al. Age- and Gender-Specific Familial Risks for Venous Thromboembolism: A Nationwide Epidemiological Study Based on Hospitalisations in Sweden. Circ 2011; 124: 1012-1020.
  • 73 Couturaud F. et al. Factors that predict thrombosis in relatives of patients with venous thromboembolism. Blood 2014; 124: 2124-2130.
  • 74 Bailey-Wilson JE, Wilson AF. Linkage Analysis in the Next-Generation Sequencing Era. Hum Hered 2011; 72: 228-236.
  • 75 Bezemer ID. et al. The value of family history as a risk indicator for venous thrombosis. Arch Intern Med 2009; 169: 610-615.
  • 76 Rehman AU. et al. Challenges and solutions for gene identification in the presence of familial locus heterogeneity. Eur J Hum Genet 2014; 23: 1-9.
  • 77 Lou X-Y. et al A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies. Am J Hum Genet 2008; 83: 457-467.
  • 78 Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 2009; 10: 392-404.
  • 79 Aschard H. et al. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum Genet 2012; 131: 1591-1613.
  • 80 Yu Z. et al. Family studies of type 1 diabetes reveal additive and epistatic effects between MGAT1 and three other polymorphisms. Genes Immun 2014; 15: 218-223.
  • 81 Wei W-H. et al. Detecting epistasis in human complex traits. Nat Rev Genet 2014; 15: 722-733.
  • 82 Tang W. et al. Genome-wide association study identifies novel loci for plasma levels of protein C: the ARIC study. Blood 2010; 116: 5032-5036.
  • 83 Dennis J. et al. The endothelial protein C receptor (PROCR) Ser219Gly variant and risk of common thrombotic disorders: a HuGE review and meta-analysis of evidence from observational studies. Blood 2012; 119: 2392-2400.
  • 84 Rocanin-Arjo A. et al. A meta-analysis of genome-wide association studies identifies ORM1 as a novel gene controlling thrombin generation potential. Blood 2013; 123: 777-785.
  • 85 Pintao MC, Roshani S, De Visser MCH. et al. High levels of protein C are determined by PROCR haplotype 3. J Thromb Haemost 2011; 9: 969-976.
  • 86 Bezemer ID. et al. Gene variants associated with deep vein thrombosis. J Am Med Assoc 2008; 299: 1306-1314.
  • 87 Wichers IM. et al. Assessment of coagulation and fibrinolysis in families with unexplained thrombophilia. Thromb Haemost 2009; 101: 465-470.
  • 88 Pan W. et al. A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 2015; 97: 86-98.
  • 89 Wang B. et al. Seq2pathway: an R/Bioconductor package for pathway analysis of next-generation sequencing data. Bioinform. 2015 Epub ahead of print.
  • 90 Diaz JA. et al. Critical review of mouse models of venous thrombosis. Arterioscler Thromb Vasc Biol 2012; 22: 556-562.
  • 91 Kretz C. et al. Modeling Disorders of Blood Coagulation in the Zebrafish. Curr Pathobiol Rep 2015; 3: 155-161.
  • 92 Ku CS. et al. Studying the epigenome using next generation sequencing. J Med Genet 2011; 48: 721-730.
  • 93 Furey TS. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 2012; 13: 840-852.
  • 94 Sun C. et al. Characterisation of HPV DNA methylation of contiguous CpG sites by bisulfite treatment and massively parallel sequencing-the FRAGMENT approach. Front Genet 2014; 5: 1-8.
  • 95 Buermans HPJ, den Dunnen JT. Next generation sequencing technology: Advances and applications. Biochim Biophys Acta 2014; 1842: 1932-1941.
  • 96 Liu L. et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012; 2012: 1-11.
  • 97 Niedringhaus TP. et al Landscape of Next-Generation Sequencing Technologies. 2011; 83: 4327-4341.
  • 98 Loman NJ. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnol 2012; 5: 434-439.
  • 99 Jünemann S. et al. Updating benchtop sequencing performance comparison. Nat Biotechnol 2013; 31: 294-296.
  • 100 Erlich Y. et al. Alta-Cyclic: a self-optimising base caller for next-generation sequencing. Nat Methods 2008; 5: 679-682.
  • 101 Massingham T, Goldman N. All your base: a fast and accurate probabilistic approach to base calling. Genome Biol 2012; 13: R13.
  • 102 Kao WC. et al. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res 2009; 19: 1884-1895.
  • 103 Ye C. et al. BlindCall: Ultra-fast base-calling of high-throughput sequencing data by blind deconvolution. Bioinformatics 2014; 30: 1214-1219.
  • 104 Das S, Vikalo H. Base calling for high-throughput short-read sequencing: dynamic programming solutions. BMC Bioinformatics 2013; 14: 129.
  • 105 Renaud G. et al. freeIbis: an efficient basecaller with calibrated quality scores for Illumina sequencers. Bioinformatics 2013; 29: 1208-1209.
  • 106 Beuf K De. et al. Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model. BMC Bioinformatics 2012; 13: 303.
  • 107 Kircher M. et al. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 2009; 10: R83.
  • 108 Kao WC, Song YS. naiveBayesCall: An efficient model-based base-calling algorithm for high-throughput sequencing. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2010. In: Research in Computational Molecular Biology. Vol. 6044 Springer; pp. 233-247.
  • 109 Das S, Vikalo H. Onlinecall: Fast online parameter estimation and base calling for illumina's next-generation sequencing. Bioinformatics 2012; 28: 1677-1683.
  • 110 Shen X, Vikalo H. ParticleCall: A particle filter for base calling in next-generation sequencing systems. BMC Bioinformatics 2012; 13: 160.
  • 111 Marth GT. et al. A general approach to single-nucleotide polymorphism discovery. Nat Genet 1999; 23: 452-456.
  • 112 Bravo HC, Irizarry RA. Model-based quality assessment and base-calling for second-generation sequencing data. Biometrics 2010; 66: 665-674.
  • 113 Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25: 1754-1760.
  • 114 Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Meth 2012; 9: 357-359.
  • 115 Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 2012; 13: 238.
  • 116 Wu J. et al. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res 2013; 41: 5149-5163.
  • 117 Krueger F, Andrews SR. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 2011; 27: 1571-1572.
  • 118 Luo R. et al. SOAPdenovo2: an empirically improved memory-efficient shortread de novo assembler. Gigascience 2012; 1: 18.
  • 119 Chu T-C. et al. Assembler for de novo assembly of large genomes. Proc Natl Acad Sci USA 2013; 110: E3417-3424.
  • 120 Bankevich A. et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol 2012; 19: 455-477.
  • 121 Afiahayati et al. An extended genovo metagenomic assembler by incorporating paired-end information. Peer J 2013; 1: e196.
  • 122 Afiahayati et al. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilising supervised learning. DNA Res 2015; 22: 69-77.
  • 123 Grabherr MG. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011; 29: 644-652.
  • 124 Uricaru R. et al. Reference-free detection of isolated SNPs. Nucleic Acids Res 2015; 43: e11.
  • 125 Rimmer A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet 2014; 46: 912-918.
  • 126 Usuyama N. et al. HapMuC: somatic mutation calling using heterozygous germline variants near candidate mutations. Bioinformatics 2014; 1-8.
  • 127 Shiraishi Y. et al An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res 2013; 41: e89.
  • 128 Koboldt DC. et al. Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection. Curr Protoc Bioinformatics 2013; 44: 15.4.1-15.4.17.
  • 129 DePristo MA. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011; 43: 491-498.
  • 130 Li B. et al. Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 2015; 11: e1005271.
  • 131 Wei Q. et al. A Bayesian framework for de novo mutation calling in parents-offspring trios. Bioinforma 2015; 31: 1375-1381.
  • 132 Cleary JG. et al. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J Comput Biol 2014; 21: 405-419.
  • 133 Schroder J. et al. Socrates: Identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics 2014; 30: 1064-1072.
  • 134 Layer RM. et al. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol 2014; 15: R84.