Methods Inf Med 2008; 47(05): 459-467
DOI: 10.3414/ME0396
Original Article
Schattauer GmbH

Support Vector Machine Quantile Regression for Detecting Differentially Expressed Genes in Microarray Analysis

I. Sohn
1   Department of Statistics, Korea University, Seoul, Korea
,
S. Kim
2   Skin Research Institute, AmorePacific R&D Center, Kyounggi-do, Korea
,
C. Hwang
3   Division of Information and Computer Sciences, Dankook University, Kyounggi-do, Korea
,
J. W. Lee
1   Department of Statistics, Korea University, Seoul, Korea
,
J. Shim
4   Department of Applied Statistics, Catholic University of Daegu, Kyungbuk, Korea
› Author Affiliations
Further Information

Publication History

Received: 16 January 2006

accepted: 09 June 2008

Publication Date:
20 January 2018 (online)

Summary

Objectives: One of the main objectives of microarray analysis is to identify genes differentially expressed under two distinct experimental conditions. This task is complicated by the noisiness of data and the large number of genes that are examined. Fold change (FC) based gene selection often misleads because error variability for each gene is heterogeneous in different intensity ranges. Several statistical methods have been suggested, but some of them result in high false positive rates because they make very strong parametric assumptions.

Methods: We present support vector quantile regression (SVMQR) using iterative reweighted least squares (IRWLS) procedure based on the Newton method instead of usual quadratic programming algorithms. This procedure makes it possible to derive the generalized approximate cross validation (GACV) method for choosing the parameters which affect the performance of SVMAR. We propose SVMQR based on a novel method for identifying differentially expressed genes with a small number of replicated microarrays.

Results: We applied SVMQR to both three biological dataset and simulated dataset and showed that it performed more reliably and consistently than FC-based gene selection, Newton’s method based on the posterior odds of change, or the nonparametric t-test variant implemented in significance analysis of microarrays (SAM).

Conclusions: The SVMQR method was an exploratory method for cDNA microarray experiments to identify genes with different expression levels between two types of samples (e.g., tumor versus normal tissue). The SVMQR method performed well in the situation where error variability for each gene was heterogeneous in intensity ranges.

 
  • References

  • 1 Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. The chipping forecast 1999; 21: 33-37.
  • 2 DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278: 680-686.
  • 3 Chen Y, Dougherty ER, Bittner ML. Ratio-based decisions and the quantitative analysis of cDNA microarray image. Biomedical Optics 1997; 2: 364-374.
  • 4 Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J of Com Bio 2001; 8: 37-52.
  • 5 Koenker R, Bassett G. Regression Quantiles. Econometrica 1978; 46: 33-50.
  • 6 Koenker R, Xiao Z. Inference on the quantile regression process. Econometrica 70 (04) 1583-1612.
  • 7 Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995
  • 8 Vapnik VN. Statistical Learning Theory. New York: Springer; 1998
  • 9 Gunn SR, Brown M, Bossley KM. Network performance assessment for neurofuzzy data modelling. Lecture Notes in Computer Science 1997; 208: 313-323.
  • 10 Ripley BD. Neural networks and related methods for classification. Journal of Royal Statistical Society 1994; 56: 409-456.
  • 11 Cristianini N, Shawe-Taylor J. Support Vector Regression. Cambridge University Press; 2000
  • 12 Gunn S. Support Vector Machines for Classification and Regression. ISIS Technical Report, University of Southampton; 1998
  • 13 Smola A, Scholkopf B. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation and Operator Inversion. Algorithmica 1998; 22: 211-231.
  • 14 Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 2001; 98: 5116-5121.
  • 15 Sohn I, Kim S, Hwang C, Lee JW. New normalization methods using support vector machine quantile regression approach in microarray analysis. Computational Statistics and Data Analysis. In press.
  • 16 Nychka D, Gray G, Haaland P, Martin D, O’Connell M. A Nonparametric Regression Approach to Syringe Grading for Quality Improvement. Journal of the American Statistical Association 1995; 90: 1171-1178.
  • 17 Muan M. GACV for quantile smoothing splines. Computational Statistics and Data Analysis 2006; 50 2006 813-829.
  • 18 Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res 1999; 27 (19) 3821-3835.
  • 19 Dudoit S, Yang YH, Speed TP, Callow MJ. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 2002; 12: 111-140.
  • 20 Kim S, Sohn I, Ahn J-I, Lee K-H, Lee Y-S, Lee Y-S. Hepatic gene expression profile in long-term high-fat diet-induced obesity mouse model. Gene 2004; 340: 99-109.
  • 21 Becker W, Kluge R, Kantner T, Linnartz K, Korn M, Tschank G, Plum L, Giesen K, Joost HG. Differential hepatic gene expression in a polygenic mouse model with insulin resistance and hyperglycemia: evidence for a combined transcriptional dysregulation of gluconeogenesis and fatty acid synthesis. J Mol Endocrinol 2004; 32: 195-208.
  • 22 Enriquez A, Leclercq I, Farrell GC, Robertson G. Altered expression of hepatic CYP2E1 and CYP4A in obese, diabetic ob/ob mice, and fa/fa Zucker rats. Biochem Biophys Res Commun 1995; 255: 300-306.
  • 23 Callow MJ, Dudoit S, Gong EL, Speed TP, Rubin EM. Microarray Expression Profiling Identifies Genes with Altered Expression in HDL-Deficient Mice. Genome Reserach 2000; 10: 2022-2029.
  • 24 Memon RA, Fuller J, Moser AH, Smith PJ, Grunfeld C, Feingold KR. Regulation of putative fatty acid transporters and Acyl-CoA synthetase in liver and adipose tissue in ob/ob mice. Diabetes 1999; 48: 121-127.
  • 25 Malewiak MI, Griglio S, Le Liepvre X. Relationship between lipogenesis, ketogenesis, and malonyl- CoA content in isolated hepatocytes from the obese Zucker rat adapted to a high-fat diet. Metabolism 1985; 34: 604-611.
  • 26 Balagurunathan Y, Dougherty E, Chen Y, Bittner M, Trent J. Simulation of cdna microarrays via a parameterized random signal model. Journal of Biomedical Optics 2002; 7: 507-523.
  • 27 Fujita A, Sato JR, de Oliverira Rodrigues L, Ferrerira CE, Sogayar MC. Evaluating different methods of microarray data normalization. BMC Bioinformatics 2006; 7: 469.
  • 28 Haldermans P, Shkedy Z, Sanden SV, Burzykowski T, Aerts M. Using Linear Mixed Models for Normalization of cDNA Microarrays. Statistical Applications in Genetics and Molecular Biology 2007; 6 (01) 1-23.
  • 29 Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002; 30 (04) e15.
  • 30 Konig IR, Malley JD, Weimar C, Diener H-C, Ziergler A. Practical experiences on the necessity of external validation. Statist Med 2007 In press.