Summary
Objectives:
Gene sequence features such as codon bias, codon context, and codon expansion (e.g.
tri-nucleotide repeats) can be better understood at the genomic scale level by combining
statistical methodologies with advanced computer algorithms and data visualization
through sophisticated graphical interfaces. This paper presents the ANACONDA system,
a bioinformatics application for gene primary structure analysis.
Methods:
Codon usage tables using absolute metrics and software for multivariate analysis
of codon and amino acid usage are available in public databases. However, they do
not provide easy computational and statistical tools to carry out detailed gene primary
structure analysis on a genomic scale. We propose the usage of several statistical
methods – contingency table analysis, residual analysis, multivariate analysis (cluster
analysis) – to analyze the codon bias under various aspects (degree of association,
contexts and clustering).
Results:
The developed solution is a software application that provides a user-guided analysis
of codon sequences considering several contexts and codon usage on a genomic scale.
The utilization of this tool in our molecular biology laboratory is focused on particular
genomes, especially those from Saccharomyces cerevisiae, Candida albicansand Escherichia coli. In order to illustrate the applicability and output layouts of the software these
species are herein used as examples.
Conclusions:
The statistical tools incorporated in the system are allowing to obtain global views
of important sequence features. It is expected that the results obtained will permit
identification of general rules that govern codon context and codon usage in any genome.
Additionally, identification of genes containing expanded codons that arise as a consequence
of erroneous DNA replication events will permit uncovering new genes associated with
human disease.
Keywords
Bioinformatics software - codon context - codon bias - contingency tables - residual
analysis - cluster analysis