Methods Inf Med 2006; 45(02): 153-157
DOI: 10.1055/s-0038-1634059
Original Article
Schattauer GmbH

Estimating the Number of Clusters in DNA Microarray Data

N. Bolshakova
1   Department of Computer Science, Trinity College Dublin, Dublin, Ireland
,
F. Azuaje
2   School of Computing and Mathematics, University of Ulster, Jordanstown, Northern Ireland, UK
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: The main objective of the research is an application of the clustering and cluster validity methods to estimate the number of clusters in cancer tumor datasets. A weighed voting technique is going to be used to improve the prediction of the number of clusters based on different data mining techniques. These tools may be used for the identification of new tumour classes using DNA microarray datasets. This estimation approach may perform a useful tool to support biological and biomedical knowledge discovery.

Methods: Three clustering and two validation algorithms were applied to two cancer tumor datasets. Recent studies confirm that there is no universal pattern recognition and clustering model to predict molecular profiles across different datasets. Thus, it is useful not to rely on one single clustering or validation method, but to apply a variety of approaches. Therefore, combination of these methods may be successfully used for the estimation of the number of clusters.

Results: The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. The results show that this estimation approach may represent an effective tool to support biomedical knowledge discovery and healthcare applications.

Conclusion: The methods implemented in this research may be successfully used for the estimation of the number of clusters. The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. These tools may be used for the identification of new tumour classes using gene expression profiles.

 
  • References

  • 1 Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998; 14863-8.
  • 2 Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 2002; 1: 21.
  • 3 Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data. Bioinformatics 2001; 309-18.
  • 4 Granzow M, Berrar D, Dubitzky W, Schuster A, Azuaje F, Eils R. Tumor identification by gene expression profiles: a comparison of five different clustering methods. ACM-SIGBIO Newsletters 2001; 16-22.
  • 5 Bolshakova N, Azuaje F. Cluster validation techniques for genome expression data. Signal Processing 2003; 825-33.
  • 6 Azuaje F, Bolshakova N. Clustering genome expression data: design and evaluation principles. A Practical Approach to Microarray Data Analysis 2003; 230-45.
  • 7 Azuaje F. A cluster validity framework for genome expression data. Bioinformatics 2002; 319-20.
  • 8 Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comp App Math 1987; 53-65.
  • 9 Dunn J. Well separated clusters and optimal fuzzy partitions. J Cybernetics 1974; 95-104.
  • 10 Bezdek JC, Pal NR. Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics 1998; 301-15.
  • 11 Davies DL, Bouldin DW. A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1979; 224-7.
  • 12 Bolshakova N, Azuaje F. Improving expression data mining through cluster validation. Proc of the 4th Annual IEEE Conf on Information Technology Applications in Biomedicine 2003; 19-22.
  • 13 Quackenbush J. Computational analysis of microarray data. Nature Reviews Genetics 2001; 418-27.
  • 14 Everitt B. Cluster Analysis 1993
  • 15 Hubert L, Schultz J. Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 1976; 190-241.
  • 16 Goodman L, Kruskal W. Measures of associations for cross-validations. J Am Stat Assoc 1954; 732-64.
  • 17 Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black P, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR. Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 2002; 436-42.
  • 18 Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeck M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 531-7.
  • 19 Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V, Hayward N, Trent J. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2002; 536-40.
  • 20 Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, En JK, Bumgarner R, Goodlett DR, Aebersol R, Hood L. Integrated genomic and proteomic analyses of a systematically perturbated metabolic network. Science 2001; 929-33.