Methods Inf Med 2006; 45(02): 163-168
DOI: 10.1055/s-0038-1634061
Original Article
Schattauer GmbH

Statistical, Computational and Visualization Methodologies to Unveil Gene Primary Structure Features

M. Pinheiro
1   IEETA/DET, University of Aveiro, Aveiro, Portugal
,
V. Afreixo
1   IEETA/DET, University of Aveiro, Aveiro, Portugal
,
G. Moura
2   Department of Biology, University of Aveiro, Aveiro, Portugal
,
A. Freitas
3   Department of Mathematics, University of Aveiro, Aveiro, Portugal
,
M. A. S. Santos
2   Department of Biology, University of Aveiro, Aveiro, Portugal
,
J. L. Oliveira
1   IEETA/DET, University of Aveiro, Aveiro, Portugal
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: Gene sequence features such as codon bias, codon context, and codon expansion (e.g. tri-nucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis.

Methods: Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods – contingency table analysis, residual analysis, multivariate analysis (cluster analysis) – to analyze the codon bias under various aspects (degree of association, contexts and clustering).

Results: The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicansand Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples.

Conclusions: The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.

 
  • References

  • 1 Comeron JM, Aguadé M. An evaluation of measures of synonymous codon usage bias. J Mol Evol 1998; 47: 268-74.
  • 2 Boycheva S, Chkodrov G, Ivanov I. Codon pairs in the genome of Escherichia coli. Bioinformatic 2003; 19: 987-98.
  • 3 Berg OG, Silva PJ. Codon bias in Escherichia coli: the influence of codon context on mutation and selection. Nucleic Acids Res 1997; 25: 1397-1404.
  • 4 Margolis RL, Ross CA. Diagnosis of Huntington disease. Clin Chem 2003; 49: 1726-32.
  • 5 Parekh-Olmedo H, Krainc D, Kmiec EB. Targeted gene repair and its application to neurodegenerative disorders. Neoron 2002; 33: 495-8.
  • 6 Avery PJ, Henderson DA. Fitting Markov chain models to discrete state series such as DNA sequences. Applied Statistics 1999; 48: 53-61.
  • 7 Sheskin DJ. Parametric and nonparametric statistical procedures. Chapman & Hall; 2000
  • 8 Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis, Theory and Practice. MIT Press; 1975
  • 9 Irwin B, Heck JD, Wesley G. Codon Pair Utilization Biases Influence Translational Elongation Step Times. The Journal of Biological Chemistry 1995; 270 (39) 22801-6.
  • 10 Haberman SJ. The analysis of residuals in crossclassified tables. Biometrics 1973; 29 (22) 205-20.
  • 11 Shah AA, Giddings MC, Parvaz JB, Gesteland RF, Atkins JF, Ivanov IP. Computational identification of putative programmed translational frameshift sites. Bioinformatics 2002; 18: 1046-53.
  • 12 Hooper SD, Berg OG. Detection of Genes with Atypical Nucleotide Sequence in Microbial Genomes. J Mol Evol 2002; 54: 365-75.
  • 13 Fedorov A, Saxonov S, Gilbert W. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res 2002; 30: 1192-7.
  • 14 Everitt BS. Cluster Analysis. 3rd ed.. Edward Arnold; 1998
  • 15 Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. Academic Press; 1994
  • 16 Everitt BS. The analysis of contingency tables. Chapman and Hall; 1977
  • 17 Sharp PM, Li WH. The codon Adaptation Index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987; 15 (03) 1281-95.
  • 18 Wright F. The ‘effective number of codons’ used in a gene. Gene 1990; 87: 23-9.