Methods Inf Med 2009; 48(03): 229-235
DOI: 10.3414/ME9225
Original Articles
Schattauer GmbH

Rule-based Clustering for Gene Promoter Structure Discovery

T. Curk
1   University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
,
U. Petrovic
2   J. Stefan Institute, Department of Molecular and Biomedical Sciences, Ljubljana, Slovenia
,
G. Shaulsky
3   Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, Texas, USA
,
B. Zupan
1   University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
3   Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, Texas, USA
› Author Affiliations
Further Information

Publication History

20 April 2009

Publication Date:
17 January 2018 (online)

Summary

Background: The genetic cellular response to internal and external changes is determined by the sequence and structure of gene-regulatory promoter regions.

Objectives: Using data on gene-regulatory elements (i.e., either putative or known transcription factor binding sites) and data on gene expression profiles we can discover structural elements in promoter regions and infer the underlying programs of gene regulation. Such hypotheses obtained in silico can greatly assist us in experiment planning. The principal obstacle for such approaches is the combinatorial explosion in different combinations of promoter elements to be examined.

Methods: Stemming from several state-ofthe-art machine learning approaches we here propose a heuristic, rule-based clustering method that uses gene expression similarity to guide the search for informative structures in promoters, thus exploring only the most promising parts of the vast and expressively rich rule-space.

Results: We present the utility of the method in the analysis of gene expression data on budding yeast S. cerevisiae where cells were induced to proliferate peroxisomes.

Conclusions: We demonstrate that the proposed approach is able to infer informative relations uncovering relatively complex structures in gene promoter regions that regulate gene expression.

 
  • References

  • 1 Bellazzi R, Zupan B. Intelligent data analysis – special issue. Methods Inf Med 2001; 40 (05) 362-364.
  • 2 Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK. et al. A genomic code for nucleosome positioning. Nature 2006; 442 7104 772-778.
  • 3 Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004; 5 (04) 276-287.
  • 4 Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell 2004; 117 (02) 185-198.
  • 5 Bajic VB, Tan SL, Suzuki Y, Sugano S. Promoter prediction analysis on the whole human genome. Nat Biotechnol 2004; 22 (11) 1467-1473.
  • 6 Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 1996; 24 (01) 238-241.
  • 7 Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994; 2: 28-36.
  • 8 Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 2005; 23 (01) 137-144.
  • 9 Down TA, Bergman CM, Su J, Hubbard TJ. Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput Biol 2007; 3 (01) e7.
  • 10 Bolshakova N, Azuaje F. Estimating the number of clusters in DNA microarray data. Methods Inf Med 2006; 45 (02) 153-157.
  • 11 Rahnenfuhrer J. Clustering algorithms and other exploratory methods for microarray data analysis. Methods Inf Med 2005; 44 (03) 444-448.
  • 12 Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat Genet 2002; 31 (04) 370-377.
  • 13 Chiang DY, Brown PO, Eisen MB. Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics 2001; 17 (01) S49-55.
  • 14 Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001; 29 (02) 153-159.
  • 15 Clark P, Nibblet T. The CN2 induction algorithm. Machine Learning 1989; 3 (04) 261-283.
  • 16 Blockeel H, De Raedt L, Ramon J. Top-down induction of clustering trees. Machine Learning. 1998
  • 17 Smith JJ, Marelli M, Christmas RH, Vizeacoumar FJ, Dilworth DJ, Ideker T. et al. Transcriptome profiling to identify genes involved in peroxisome assembly and function. J Cell Biol 2002; 158 (02) 259-271.
  • 18 MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 2006; 7: 113.
  • 19 Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 2004; 431 7004 99-104.