Estimating the Number of Clusters in DNA Microarray Data

N. Bolshakova; F. Azuaje

doi:10.1055/s-0038-1634059

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Methods Inf Med 2006; 45(02): 153-157
DOI: 10.1055/s-0038-1634059

Original Article

Schattauer GmbH

Estimating the Number of Clusters in DNA Microarray Data

N. Bolshakova

¹Department of Computer Science, Trinity College Dublin, Dublin, Ireland

,

F. Azuaje

²School of Computing and Mathematics, University of Ulster, Jordanstown, Northern Ireland, UK

› Author Affiliations

Further Information

Publication History

Publication Date:
06 February 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Summary

Objectives: The main objective of the research is an application of the clustering and cluster validity methods to estimate the number of clusters in cancer tumor datasets. A weighed voting technique is going to be used to improve the prediction of the number of clusters based on different data mining techniques. These tools may be used for the identification of new tumour classes using DNA microarray datasets. This estimation approach may perform a useful tool to support biological and biomedical knowledge discovery.

Methods: Three clustering and two validation algorithms were applied to two cancer tumor datasets. Recent studies confirm that there is no universal pattern recognition and clustering model to predict molecular profiles across different datasets. Thus, it is useful not to rely on one single clustering or validation method, but to apply a variety of approaches. Therefore, combination of these methods may be successfully used for the estimation of the number of clusters.

Results: The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. The results show that this estimation approach may represent an effective tool to support biomedical knowledge discovery and healthcare applications.

Conclusion: The methods implemented in this research may be successfully used for the estimation of the number of clusters. The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. These tools may be used for the identification of new tumour classes using gene expression profiles.

Keywords

Gene expression - data mining - clustering - cluster evaluation - validity indices

References
1 Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998; 14863-8.

PubMed Google Scholar
2 Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 2002; 1: 21.

PubMed Google Scholar
3 Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data. Bioinformatics 2001; 309-18.

PubMed Google Scholar
4 Granzow M, Berrar D, Dubitzky W, Schuster A, Azuaje F, Eils R. Tumor identification by gene expression profiles: a comparison of five different clustering methods. ACM-SIGBIO Newsletters 2001; 16-22.

PubMed Google Scholar
5 Bolshakova N, Azuaje F. Cluster validation techniques for genome expression data. Signal Processing 2003; 825-33.

PubMed Google Scholar
6 Azuaje F, Bolshakova N. Clustering genome expression data: design and evaluation principles. A Practical Approach to Microarray Data Analysis 2003; 230-45.

PubMed Google Scholar
7 Azuaje F. A cluster validity framework for genome expression data. Bioinformatics 2002; 319-20.

PubMed Google Scholar
8 Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comp App Math 1987; 53-65.

PubMed Google Scholar
9 Dunn J. Well separated clusters and optimal fuzzy partitions. J Cybernetics 1974; 95-104.

PubMed Google Scholar
10 Bezdek JC, Pal NR. Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics 1998; 301-15.

PubMed Google Scholar
11 Davies DL, Bouldin DW. A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1979; 224-7.

PubMed Google Scholar
12 Bolshakova N, Azuaje F. Improving expression data mining through cluster validation. Proc of the 4th Annual IEEE Conf on Information Technology Applications in Biomedicine 2003; 19-22.

PubMed Google Scholar
13 Quackenbush J. Computational analysis of microarray data. Nature Reviews Genetics 2001; 418-27.

PubMed Google Scholar
14 Everitt B. Cluster Analysis 1993
15 Hubert L, Schultz J. Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 1976; 190-241.

PubMed Google Scholar
16 Goodman L, Kruskal W. Measures of associations for cross-validations. J Am Stat Assoc 1954; 732-64.

PubMed Google Scholar
17 Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black P, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR. Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 2002; 436-42.

PubMed Google Scholar
18 Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeck M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 531-7.

PubMed Google Scholar
19 Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V, Hayward N, Trent J. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2002; 536-40.

PubMed Google Scholar
20 Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, En JK, Bumgarner R, Goodlett DR, Aebersol R, Hood L. Integrated genomic and proteomic analyses of a systematically perturbated metabolic network. Science 2001; 929-33.

PubMed Google Scholar

Subscribe to RSS

Share / Bookmark

Estimating the Number of Clusters in DNA Microarray Data

Publication History

Summary

Keywords

References