Statistical, Computational and Visualization Methodologies to Unveil Gene Primary Structure Features

M. Pinheiro; V. Afreixo; G. Moura; A. Freitas; M. A. S. Santos; J. L. Oliveira

doi:10.1055/s-0038-1634061

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2006; 45(02): 163-168
DOI: 10.1055/s-0038-1634061

Original Article

Schattauer GmbH

Statistical, Computational and Visualization Methodologies to Unveil Gene Primary Structure Features

Authors

M. Pinheiro

¹IEETA/DET, University of Aveiro, Aveiro, Portugal
V. Afreixo

¹IEETA/DET, University of Aveiro, Aveiro, Portugal
G. Moura

²Department of Biology, University of Aveiro, Aveiro, Portugal
A. Freitas

³Department of Mathematics, University of Aveiro, Aveiro, Portugal
M. A. S. Santos

²Department of Biology, University of Aveiro, Aveiro, Portugal
J. L. Oliveira

¹IEETA/DET, University of Aveiro, Aveiro, Portugal

Further Information

Publication History

Publication Date:
06 February 2018 (online)

Permissions and Reprints

Summary

Objectives: Gene sequence features such as codon bias, codon context, and codon expansion (e.g. tri-nucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis.

Methods: Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods – contingency table analysis, residual analysis, multivariate analysis (cluster analysis) – to analyze the codon bias under various aspects (degree of association, contexts and clustering).

Results: The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicansand Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples.

Conclusions: The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.

Keywords

Bioinformatics software - codon context - codon bias - contingency tables - residual analysis - cluster analysis

References
1 Comeron JM, Aguadé M. An evaluation of measures of synonymous codon usage bias. J Mol Evol 1998; 47: 268-74.

Crossref Search in Google Scholar
Download RIS citation
2 Boycheva S, Chkodrov G, Ivanov I. Codon pairs in the genome of Escherichia coli. Bioinformatic 2003; 19: 987-98.

Crossref Search in Google Scholar
Download RIS citation
3 Berg OG, Silva PJ. Codon bias in Escherichia coli: the influence of codon context on mutation and selection. Nucleic Acids Res 1997; 25: 1397-1404.

Crossref Search in Google Scholar
Download RIS citation
4 Margolis RL, Ross CA. Diagnosis of Huntington disease. Clin Chem 2003; 49: 1726-32.

Crossref Search in Google Scholar
Download RIS citation
5 Parekh-Olmedo H, Krainc D, Kmiec EB. Targeted gene repair and its application to neurodegenerative disorders. Neoron 2002; 33: 495-8.

Search in Google Scholar
Download RIS citation
6 Avery PJ, Henderson DA. Fitting Markov chain models to discrete state series such as DNA sequences. Applied Statistics 1999; 48: 53-61.

Search in Google Scholar
Download RIS citation
7 Sheskin DJ. Parametric and nonparametric statistical procedures. Chapman & Hall; 2000

Search in Google Scholar
Download RIS citation
8 Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis, Theory and Practice. MIT Press; 1975

Download RIS citation
9 Irwin B, Heck JD, Wesley G. Codon Pair Utilization Biases Influence Translational Elongation Step Times. The Journal of Biological Chemistry 1995; 270 (39) 22801-6.

Crossref Search in Google Scholar
Download RIS citation
10 Haberman SJ. The analysis of residuals in crossclassified tables. Biometrics 1973; 29 (22) 205-20.

Crossref Search in Google Scholar
Download RIS citation
11 Shah AA, Giddings MC, Parvaz JB, Gesteland RF, Atkins JF, Ivanov IP. Computational identification of putative programmed translational frameshift sites. Bioinformatics 2002; 18: 1046-53.

Crossref Search in Google Scholar
Download RIS citation
12 Hooper SD, Berg OG. Detection of Genes with Atypical Nucleotide Sequence in Microbial Genomes. J Mol Evol 2002; 54: 365-75.

Crossref Search in Google Scholar
Download RIS citation
13 Fedorov A, Saxonov S, Gilbert W. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res 2002; 30: 1192-7.

Crossref Search in Google Scholar
Download RIS citation
14 Everitt BS. Cluster Analysis. 3rd ed.. Edward Arnold; 1998

Download RIS citation
15 Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. Academic Press; 1994

Search in Google Scholar
Download RIS citation
16 Everitt BS. The analysis of contingency tables. Chapman and Hall; 1977

Search in Google Scholar
Download RIS citation
17 Sharp PM, Li WH. The codon Adaptation Index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987; 15 (03) 1281-95.

Crossref Search in Google Scholar
Download RIS citation
18 Wright F. The ‘effective number of codons’ used in a gene. Gene 1990; 87: 23-9.

Crossref Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Statistical, Computational and Visualization Methodologies to Unveil Gene Primary Structure Features

Authors

Publication History

Summary

Keywords

References