Methods Inf Med 2004; 43(01): 4-8
DOI: 10.1055/s-0038-1633413
Original Article
Schattauer GmbH

Marker Identification and Classification of Cancer Types Using Gene Expression Data and SIMCA

Authors

  • S. Bicciato

    1   Department of Chemical Process Engineering, University of Padova, Italy
  • A. Luchini

    1   Department of Chemical Process Engineering, University of Padova, Italy
  • C. Di Bello

    1   Department of Chemical Process Engineering, University of Padova, Italy
Further Information

Publication History

Publication Date:
07 February 2018 (online)

Preview

Summary

Objectives: High-throughput technologies are radically boosting the understanding of living systems, thus creating enormous opportunities to elucidate the biological processes of cells in different physiological states. In particular, the application of DNA micro-arrays to monitor expression profiles from tumor cells is improving cancer analysis to levels that classical methods have been unable to reach. However, molecular diagnostics based on expression profiling requires addressing computational issues as the overwhelming number of variables and the complex, multi-class nature of tumor samples. Thus, the objective of the present research has been the development of a computational procedure for feature extraction and classification of gene expression data.

Methods: The Soft Independent Modeling of Class Analogy (SIMCA) approach has been implemented in a data mining scheme, which allows the identification of those genes that are most likely to confer robust and accurate classification of samples from multiple tumor types.

Results: The proposed method has been tested on two different microarray data sets, namely Golub’s analysis of acute human leukemia [1] and the small round blue cell tumors study presented by Khan et al. [2]. The identified features represent a rational and dimensionally reduced base for understanding the biology of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for classification of pathological states.

Conclusions: The analysis of the SIMCA model residuals allows the identification of specific phenotype markers. At the same time, the class analogy approach provides the assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.

 
  • References

  • 1 Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular Classification of Cancer: Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999; 286 5439 531-7.
  • 2 Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001; 7 (06) 673-9.
  • 3 Wold S, Sjostrom M. SIMCA: a method for analyzing chemical data in terms of similarity and analogy. Kowalski BR. editor Chemometrics: Theory and Application. Washington: ACS; 1977
  • 4 Jolliffe I. Principal Components Analysis. Springer-Verlag. 1986