Geburtshilfe Frauenheilkd 2008; 68 - A17
DOI: 10.1055/s-0028-1121896

Molecular techniques for identification of new prognostic and predictive markers

A Schneeweiss 1
  • 1University of Heidelberg

Breast cancer is a heterogeneous disease, associated with a variety of pathological features and clinical behaviour. Traditional prognostic factors based on clinical and pathological variables are unable to fully capture this heterogeneity. High-throughput technologies (HTT), however, offer the opportunity to extensively improve current knowledge regarding the molecular basis of clinical and biological heterogeneity, thus leading to the possibility of developing improved prognostic and predictive markers. The impact of these new tools introduced by the 'omics' sciences is becoming relevant on daily clinical practice in breast cancer diagnosis and treatment. Hence, the scope of this presentation is to discuss current results and challenges regarding their applicability.

Prognostic classifiers aim at the identification of patients who will be cured by local treatment or who need only less aggressive systemic therapy. Predictive classifiers aim to identify patients who do or do not benefit from a specific treatment. Context-specific predictors, finally, aim at predicting outcome in a well-defined set of patients who receive a well-defined therapy (context) without differentiating between a potential prognostic and predictive impact. The clinical utility of those markers must be formally proven in an independent clinical validation study by demonstrating that the benefit of an individual patient improves as a result of using the classifier. A new classifier should also provide novel information that is independent of that already available from established classifiers. In addition, the new marker should be easily accessible for a broad clinical application and its determination should be easily reproducible with standardized accuracy.

Currently, there are five different main 'omics' sciences: genomics, epigenomics, transcriptomics, proteomics and metabolomics. Due to the scope of this presentation, issues regarding metabolomics will not be discussed.

Genomics studies the complete genome. As human cancer is caused by sequential accumulation of mutations, deletions, and amplifications of either oncogenes or tumor suppressor genes, several technologies with partially overlapping aims have been used for identification of tumour-specific (somatic) mutations throughout the whole genome. Microarray-based comparative genomic hybridization (CGH) is a molecular-cytogenetic method capable of detecting loss, gain, and amplification of the gene copy number. Single nucleotide polymorphism (SNP) array is a type of DNA microarray used to detect variations at DNA single sites with a frequency of greater 1% within the population examined. Hybridization to SNP arrays is an efficient method for detecting loss of heterozygosity (LOH) genome-wide which may result from the complete loss of an allele or from an increase in copy number of one allele relative to the other. In contrast to CGH arrays, SNP arrays do not only detect gains and losses but also the copy number of genes. High-density SNP arrays even permit detection of smaller regions of LOH.

Epigenomics studies the changes other than those in the DNA sequence which mainly encompasses DNA methylation and post-translational modification of histones. These changes induce patterns of altered gene expression.

Transcriptomics attempts to analyse the whole gene expression profiling. DNA microarray technologies rely on the hybridization of DNA strands with their precise complementary copies where one sequence is bound onto a solid-state substrate. These are hybridized to probes of fluorescent cDNAs or genomic sequences from normal and tumour tissue. By analysing the intensity of the fluorescence on the microarray chip, direct comparisons of gene expression in normal vs. tumour tissue can be made. DNA microarray platforms differ in terms of material used (short or long oligonucleotides, cDNA) and number of samples per array (single-channel or two-channel). In the past, there has been much scepticism regarding reliability and reproducibility of this technology, but divergent results from the different studies have been extensively explored in the meantime: When comparing this new microarray technology with classical diagnostic tests, the reproducibility is comparable, for example, to that of immunohistochemical analysis for hormone receptors in breast cancer. Multiplex quantitative real-time PCR (qRT-PCR) is based on quantification of a fluorescent reporter molecule generated using PCR, which reflects the abundance of the mRNA target. Currently available qRT-PCR systems, however, can measure only up to a few hundred genes simultaneously, which is substantially less than what comprehensive profiling DNA microarrays can provide. MicroRNAs (miRNA) are an abundant class of small non-protein coding RNAs that function as negative gene regulators. Their alterations are involved in initiation and progression of human cancers.

Proteomics examines how, when, and where proteins are expressed. The technologies used include two-dimensional gel electrophoresis, mass spectrometry and protein microarrays. Mass spectrometry is typically used for comprehensive proteomic surveys.

Although this list of HTT is far from complete, the number of markers that have emerged as clinically useful so far is small. Main reasons for the still lacking transfer into the clinic are poor study design, inadequate sample size, lack of assay standardization, and inappropriate or misleading statistical analyses. Additional problems limiting the success of such new markers are reproducibility and wide applicability. However, some findings from transcriptome research may be ready for widespread application in the near future and will thus be discussed here in more detail.

Microarray studies have consistently distinguished four main molecular classes of breast cancer according to their gene-expression profile with distinct clinical outcome and response to therapy: basal-like, HER2+, luminal-A, and luminal-B breast cancers. Basal-like and HER2+ subtypes are more aggressive with a higher proportion of TP53 mutations and a higher likelihood of being tumor grade three than luminal-A tumours. However, they tend to respond better to chemotherapy with higher rates of pathological complete response after neoadjuvant chemotherapy. Conversely, only a few luminal subtypes have mutations in TP53. These tumors tend to be more sensitive to endocrine treatment and less responsive to conventional chemotherapy, but, overall, have a better clinical outcome. Around 30% of breast cancers, however, do not fall into any of these four categories. It thus remains uncertain, how many true molecular subclasses exist, and it is quite plausible that this molecular classification will evolve, with new technology platforms, larger databases, and improved understanding of tumour biology.

Three different supervised predictor strategies for the development of gene expression prognostic signatures have been used so far: (1) the 'top-down' approach, which compares gene expression data from cohorts with known clinical outcomes to identify genes that are associated with prognosis without any a priori biological assumption; (2) the hypothesis driven 'bottom-up' approach, which first identifies gene expression patterns associated with a specific phenotype and then correlates these findings with clinical outcome; and (3) the 'candidate-gene' approach which combines selected genes of interest into a multivariate predictive model.

Using the 'top-down' approach, a 70-gene prognostic signature (Amsterdam signature) was identified in a series of 78 systemically untreated node-negative breast cancer patients younger than 55 years and was further validated on a larger set of 295 breast cancer patients younger than 53 years, including node-negative and node-positive as well as systemically treated and untreated patients. In particular, 61 patients used in the training set, were also included in the validation set. This signature which included mainly genes involved in cell cycle, invasion, metastasis, angiogenesis, and signal transduction, outperformed currently available prognostic factors for prediction of 5-years distant metastasis-free survival. Using the same 'top-down' approach, another 76-gene signature (Rotterdam signature) predictive for distant metastasis at 5 years irrespective of age and tumour size, was separately developed in patients with systemically untreated node negative breast cancer for ER-positive and ER-negative disease. In a retrospective validation study with 180 node-negative untreated breast cancer patients, this signature demonstrated discriminative power in predicting development of distant metastases in all age groups. The analysis according to ER status could not be performed because the ER-negative group was too small. When validated in 302 untreated node-negative patients younger than 61 years by TRANSBIG, again both the Amsterdam and Rotterdam signature were able to outperform the best available clinical tools for risk assessment. Both signatures correctly identified low-risk patients, but were limited in identifying the high-risk patients as half of those identified as high-risk did not relapse. Furthermore, the signatures seem better predictors of earlier than later relapse. This suggests that clinically these signatures may be most useful in reducing overtreatment of low-risk patients. A third gene expression signature (recurrence score, RS) was developed to predict the likelihood of distant recurrence in tamoxifen-treated patients with node-negative, ER-positive breast cancer. This score based on an RT-PCR gene expression analysis of 16 cancer related genes and 5 reference genes in paraffin-embedded tumor tissue, classifies patients into three groups with high, intermediate, or low-risk of distant recurrence. Retrospective validation of this predictor in 668 patients with ER-positive node-negative breast cancer treated with tamoxifen in the NSABP B-14 trial showed that RS was significantly correlated with distant recurrence and death with a 10-years distant recurrence and death rate in the low-risk group (RS<18) of 6.8% and 3.1%, respectively. In a multivariate Cox model, RS was a more accurate classifier than Adjuvant! Online. RS was also prognostic in the untreated placebo arm patients of the NSABP B-14 trial. The comparison with the treated patients of this trial revealed that patients with low and intermediate RS (RS<31) but not those with RS≥31 derived substantial benefit from tamoxifen. More recently, evidence indicated that RS also predicts long-term outcome in ER-positive and node-positive breast cancer. Finally, RS is also associated with benefit from chemotherapy. It predicts likelihood of pathological complete response and clinical complete response after neoadjuvant chemotherapy. It was suggested that low RS tumours show a common pattern of drug insensitivity to several cytotoxic agents.

Using the bottom-up approach, researchers have evaluated whether gene expression patterns associated with histological grade were able to improve prognostic capabilities especially within the group of intermediate grade tumours. They found 97 unique genes, mostly associated with cell-cycle progression and differentiation that formed the gene-expression grade index (GGI). These genes were differentially expressed between low and high-grade breast tumours, while the intermediate-grade tumours showed a GGI and clinical outcome matching those of either low or high-grade cases. Other prognostic signatures derived from this bottom-up approach include the wound response signature, mutant/wild p53 signature, invasive gene signature, and cancer stem cell signature. All these signatures have only a few genes in common, but seem to offer similar predictive information with proliferation-related genes being the major driving force.

Even though results from most of these studies are rather promising, it will still be a long way to go before these molecular tools are able to enter routine clinical practice. Many of these studies are retrospective and the gene expression data come from archival material of heterogeneous populations. While level 1 evidence is therefore currently lacking for molecular markers, two large prospective clinical validation studies are underway to confirm the clinical utility of two of these signatures, the MINDACT-EORTC trial 10041 (BIG 3–04) evaluating the 70-gene Amsterdam signature and the TAILORx trial evaluating the 21-gene signature or RS. However, even if positively validated in large trials, it is quite unlikely that gene expression profiling will replace existing clinico-pathological guidelines altogether. Gene expression data will rather become part of an integrated decision-making model based on multiple levels and sources of prognostic data. The best use of such gene expression signatures may well be in directing treatment decisions when clinical risk parameters for an individual patient are equivocal.