Working Group Recommendations for Critical Elements in NGS Reporting
Report Structure and Format
Expected Format of an NGS Report
It is the consensus of the working group and end users that an NGS report must be
legible. It should contain a summary of findings, which should be clear, without jargon,
and easy to read through for a layperson. Including too much information often turns
counterproductive in comprehending the report for a clinician or patient. All laboratories
must make sure that the final report is presented in a comprehensive and clear manner
without omitting essential information. The ideal NGS assay report based on DNA sequencing
or RNA sequencing must contain the following without fail:
-
• Name of the patient (individual).
-
• Date of birth or age.
-
• Gender (if consented).
-
• Nationality.
-
• Ethnicity.
-
• Type of specimen.
-
• Date of collection of the specimen.
-
• Date and time of receipt of specimen in the laboratory.
-
• Laboratory identification number.
-
• Name and affiliation of requesting clinician.
-
• Contact details of patient or kin where applicable.
-
• Name of the test (single gene/gene panel/whole-exome sequencing/WGS [whole genome
sequencing]); total number of genes targeted, total number of reads obtained on target,
the name of the genes that were not covered or genes with coverage below 100X, and
total number of fusion genes targeted.
-
• Date of issue of the report.
-
• Indication for testing (including cancer type).
-
• Methodology of the assay.
-
• Quality assurance parameters.
-
• Any additional information on quality control (QC) failure.
-
• Limitations of the assay.
-
• Description of genomic and or transcriptomic variants, including indels and fusions
in accordance with Human Genome Variation Society (HGVS) nomenclature.
-
• Classification of genomic variants according to American College of Medical Genetics
and Genomics/American Association of Molecular Pathology (ACMG/AMP) guidelines and/or
the European Society for Molecular Oncology (ESMO) Scale for Clinical Actionability
of Molecular Targets (ESCAT) system.
-
• Clear inclusion of variant allele fraction (VAF) and copy number alterations (where
applicable).
-
• Targeted therapy available globally and in India and approved by regulatory bodies.
-
• Specific mention of zygosity in case of germline assays and allelic fraction of
the variant in somatic assays.
-
• Name, identification number of clinical trials open for recruitment in India, and
country of origin of clinical trials outside India that are available for the variants
reported in the assay.
-
• Inclusion of variants of uncertain significance.
-
• Indication of clonal hematopoiesis (CHIP) mutations in solid tumor patients with
suggestion for confirmation by paired blood sequencing.
-
• Signatures and designation of reporting authorities including technical supervisor.
-
• Laboratory accreditation identifiers whenever applicable.
-
• Disclaimers if applicable.
-
• Brief description of assay validation including reference to peer-reviewed publication.
-
• Recommendations for further testing that could help clarify any existing confounders
in the presented report, for example, RNA NGS for better detection of any fusion genes
picked up on a DNA NGS. Also, recommendations for cases in which it is preferred to
cross-check a positive report, for example: NTRK fusion gene testing ([Fig. 1]).
Fig. 1 Template for a next generation sequencing report.
Signatory Authorities
-
• It is mandatory that the NGS reports be signed by the technical supervisor, scientist,
and or clinician, and the head of the laboratory with the date and designation to
affirm the responsibility of the technical information generated and issued in the
report.
-
• The qualifications of the personnel mentioned must be in accordance with the guidelines
set forth by the Ministry of Family Health and Welfare (MoFHW) as per the gazette
notification of May 18, 2018 (http://clinicalestablishments.gov.in/WriteReadData/4161.pdf) followed by approval of national accreditation bodies.
-
• It is the responsibility of the head of the laboratory in the case of a standalone
diagnostic center and the head of the institution in the case of a hospital to ensure
the guidelines are followed.
Nomenclature of Variants
-
• Variants must be depicted as per standard international guidelines on variant nomenclature
put forward by HGVS, the Human Variome Project, and the Human Genome Organization.
-
• It is mandatory to depict variants at the DNA level. Depiction at the transcriptomic
or proteomic level is optional. A publicly accepted reference sequence based on the
GRCh38/hg38 build must be the standard. Type of reference sequence must be depicted
by a prefix:
-
“c.”: coding DNA reference sequence.
-
“g.”: linear genomic reference sequence.
-
“n.”: noncoding DNA reference sequence.
-
“p.”: protein reference sequence.
-
“r.”: RNA reference sequence (transcript).
-
• At the DNA level, the nucleotides are to be depicted in upper case letters, at the
RNA level they must be in lowercase letters and at the protein level, the three-letter
code is preferable as per IUPAC-IUBMB symbols.
-
• When there is more than one type of variation, the following order must be adhered
to: substitution, deletion, inversion, duplication, and insertion.
-
• All genes are to be italicized and must be depicted in the most recent symbol approved by the HUGO Gene Nomenclature
Committee (HGNC) as to attain uniform reporting.
-
• Special characters such as “ + ,” “-,” and “*” must also follow the HGVS system.
-
• Abbreviations denoting type/nature of variation must be strictly adhered to: “ > ” for substitution, del for deletion, dup for duplication, ins for insertion, inv for inversion, fs for frameshift, and ext for extension.
-
• Fusions are to be denoted using the symbol “::” in between the fusion partner gene symbols.
Tier-Based Classification of Variants
There are definitive international recommendations on the classification of variants
in the somatic setting in cancer and in the germline setting in all diseases. All
reports based on NGS must follow the AMP-ASCO-CAP (Association for Molecular Pathology–the
American Society of Clinical Oncology–the College of American Pathologists College
of American Pathologists) classification for somatic variants and the ACMG guidelines
for variant classification for Mendelian disorders. ESMO had introduced the ESCAT
ranking parallelly, but it is yet to be adapted on a global scale. We recommend the
AMP-ASCO-CAP and ACMG guidelines be mandatory in NGS reports.
Using Predictive Algorithms
Artificial intelligence (AI) and deep neural networks are utilized in cancer research
daily. With the advancement of technology, laboratories are adopting AI to decipher
and make sense of the volume of data generated with larger sized genomic and transcriptomic
panels. Yet, as with most applications and output of AI, the predictive algorithms
([Table 1]) offered for clinical services must be welcome but accepted under the “research
use only (RUO)” label.
Interpreting Signaling Pathways from Variant Information
Genomic and transcriptomic variations in a tumor are directed to eventually alter
protein or metabolic signaling pathways for sustenance of tumor growth or invasion.
Dedicated canonical pathways such as RTK/Ras/Raf/Mek, PI3K/Akt, Wnt/β catenin, p53,
Myc, Notch, and Hippo are mostly the driving pathways in cancer; however, there is
a multitude of noncanonical pathways that cannot be overlooked. The routine NGS gene
panels are limited in the number of genes queried as compared with the actual variations
in a cancer genome. Hence deriving signaling pathway information from a limited set
of variants is to be understood as limited information and not heavily relied upon.
The larger the panel, the stronger the predictive capacity of driver signaling pathways.
However, large panels bring with them the potential of detecting variants of doubtful
clinical significance. The limitations of each should be clearly mentioned in the
final report.
Therapeutic Options
The inclusion of targeted therapy and immunotherapy under the umbrella of precision
oncology represents the endpoints of assaying tumor samples using molecular techniques.
Every NGS report must thus have the approved and actionable therapeutic molecule mentioned
next to the variant found in the tumor sample.
Off-Label Therapeutic Suggestions
Off-label use of a drug means using a drug “out of instruction.” As per the World
Health Organization, half of the drugs globally are used off-label for various indications.
The scenario is not different in cancer standard of care or targeted therapy. The
decision and choice of using targeted therapy in an off-label mode rests upon the
consensus decision of a molecular tumor board or the medical oncologist.
Minimum Quality Control Requisites
Total quality management in a laboratory is undisputedly the most important yardstick
of the authenticity of a test report generated from the same. International guidelines
by the American Federal program Clinical Laboratory Improvement Amendments (CLIA)
mandate laboratories to follow strict policies on QC matrices.
Preanalytical Phase of NGS in Tissue-Based Assays
The success of the NGS-based molecular testing depends in large part on having an
adequate amount of tumor (thereby sufficient DNA), having enough tumor percent, and
minimizing potential tissue issues.
The quality of FFPE (formalin-fixed paraffin-embedded) block is a very important yardstick
for a quality report. Following points are mandatory during the selection of the tissue:
FFPE samples should be reviewed by trained and board-certified molecular onco-pathologists
for specimen suitability, including specimen type, tumor purity, and quantity.
Tumor purity: tumor purity is often a crucial but overlooked variable in NGS sample
assessment. It is an indicator of the number or fraction of tumor cells out of all
the cells present in a sample submitted by the pathologist. A score of 20 to 30% is
generally accepted by laboratories but there is no standardization. Tumor purity could
indirectly influence calculation of TMB and inference of germline mutations from a
given sample.
Hence it is important that laboratories maintain a harmonized cut-off score for good-quality
results. The committee advises a cut-off score of tumor content (20% for smaller panels:
15–50 genes), 30% for the bigger panels (>500 genes) for the macro/microdissection,
given the fact that 30% is the international standard cut-off.
Nucleic acid yield: the starting material of an NGS assay is DNA or RNA. Therefore,
the yield and purity of the nucleic acid are of paramount importance in generating
appropriate results. Fluorescence-based DNA measurements are far lower than those
quantified by spectrophotometry, but results are more accurate and precise, particularly
at lower concentration ranges. In lung cancer, >30 ng can be considered a cut-off.
100 ng could be considered optimal for laboratories in general.
There are fewer chances of library failures when samples have a minimum DNA quality
score represented as DNA integrity number (DIN) of 3 with a DNA concentration of at
least 5 ng/L and a minimum library concentration of 40 nmol for targeted panels. RNA
library concentration is the only parameter directly associated with coverage and
not the RNA integrity number (RIN) in solid tumors. Therefore, the threshold value
of DIN >3, with a minimum concentration of 5 ng/µL, should be accepted. The RNA distribution
value is a better quality metric than the RIN.
There is a wide range of preanalytical variables that affect DNA quality, e.g., the
presence or absence of fixation; the type of fixative; length of fixation in FFPE
tissue.
Preliminary or final pathology reports should accompany all specimens.
Tissue should be fixed in 10% neutral-buffered formalin. Other fixatives are discouraged
unless otherwise specified.
NGS sequencing and fragment size are both crucial components in the clinic's analysis
of DNA material. Because it influences the type and quantity of DNA that can be extracted
and analyzed, fragment size is significant. The particular application and platform
being used determine the ideal fragment size for NGS sequencing. For Illumina sequencing,
a fragment size of 300 to 500 base pairs is advised. Since the fragment size can influence
the precision of variant detection and cause sequencing errors, choosing the right
fragment size is essential for generating high-quality sequencing results.
Preanalytical Phase of NGS in Liquid Biopsy
Liquid biopsy refers to testing molecular representatives such as circulating tumor
cells, circulating tumor DNA (ctDNA), exosomes, tumor extracellular vesicles, tumor-educated
platelets, circulating cell-free RNA, etc. primarily from blood and to a lesser extent
from other body fluids. ctDNA accounts for 0.1 to 10% of 10 to 100 ng/mL of cfDNA.
Time is an essential factor in liquid biopsy. These molecules have frail half-lives.
Other factors are freeze-thawing, temperature, time lost between blood draw and analysis,
DNA disintegration, and leakage from cells. Hence it is important to ensure that the
starting material has passed internal QC checks. For liquid samples, flow cytometry
or other methods should be used to evaluate the sample's percentage of neoplastic
cells. Laboratories should archive either a representative slide or image of the tissue
tested.
Reporting of Tissue-Based NGS Assays
Reporting Somatic Variants in a Tumor Sample
Identification of somatic variants is done through whole exome or somatic targeted
mutations in the clinic. Reporting of the single nucleotide variant (SNV) present
in cancer cells is done by using databases and bioinformatic methods. hg19 and hg38
are the two versions used for purpose of alignment. The GRCh38 ALT contigs are recognizable
by their _alt suffix. GRCh38 /hg38 is strongly recommended over hg19. In addition
to adding many alternate contigs, GRCh 38 corrects thousands of small sequencing artifacts
that cause false single nucleotide polymorphisms (SNPs) and indels. It also includes
synthetic centromeric sequence and updates nonnuclear genome sequence.
All variants that predict sensitivity, resistance, or toxicity to a specific therapy,
alter the function of any gene, which can be targeted by approved or investigational
drugs or included in clinical trials or can influence disease prognosis or assist
in diagnosing cancer, or can be used for early cancer detection, may be included in
the report separately. All clinically relevant information for that tumor type should
be mentioned including the pertinent negative variants which are not detected for
that tumor type. When reporting a variant, reference sequence databases, population
databases, cancer-specific databases, and constitutional variant databases should
be considered along with in silico (computational) tool predictions, and relevant
publications on functional aspects of the variant should be considered. Reports should
be static with the date of issue as medical knowledge is known to change rapidly.
Levels of evidence: somatic variants should be categorized based on the level of evidence
into four tiers. Clinical and experimental evidence labeled from A to D is used to
classify these tiers as shown in [Table 2]. Tier I variants have strong clinical significance and have approved therapy for
that tumor type or have well-provided studies supporting the same. Tier II variants
are approved for other tumor types or supported by preclinical trials or case reports.
Tier III are variants of unknown significance and Tier IV are benign or likely benign
variants and are usually not included in the report.
Table 2
Tier-based reporting categories based on clinical and/or experimental evidence
Tier I: variants of strong clinical significance
|
Tier II: variants of potential clinical significance
|
Tier III: variants of unknown clinical significance
|
Tier IV: benign or likely benign variants
|
Level A evidence
Variants with approved therapy included in professional guidelines
|
Level C evidence
Variants with approved therapies for different tumor types or investigational therapies
|
Not observed at a significant allele frequency in the general or specific subpopulation
databases, or pan-cancer or tumor-specific variant databases
No convincing published evidence of cancer association
These variants should not have been observed at significant allele frequencies in
the general population, such as in the 1000 Genomes Project database, Exome Variant
Server, or Exome Aggregation Consortium database.
|
Observed at significant allele frequency in the general or specific subpopulation
databases
No existing published evidence of cancer association.
Most of the reports usually do not respond
|
Level B evidence
Variants with well-powered studies and having consensus from experts in the field
|
Level D evidence
Preclinical trials or a few case reports without consensus including the variant
|
For example: 1. BRAFV600E predicts response to the approved drug vemurafenib in melanoma
2. KRAS mutations predict resistance to anti-epidermal growth factor receptor monoclonal
antibodies in colorectal cancer
|
For example: Alpelisib is approved for PIK3CA exon 9, p. E545K mutation in hormone-positive breast
cancer patients only and if found in other cancer types, it would be a Tier II variant
with level C evidence.
|
-
While reporting a somatic variant, it is mandatory to include the complete details
of the variant as per standard international nomenclature guidelines, allelic fraction,
level of evidence, and classification.
Variant Allele Fraction
VAF is the percentage of sequence reads observed matching a specific DNA variant divided
by the overall coverage at that locus. Presence of normal cells in the sample and heterogeneity of tumors influence VAF.
A somatic assay is generally validated to ascertain as low as 5 to 10% VAF. VAF above
50 when all contributing quality factors including tumor purity, coverage, etc. align
could raise suspicion of germline variations in the patient. Caution must be exercised
as paired tumor-normal testing is not a common practice in the field at present. A
region with loss of heterozygosity (LOH) could also falsely elevate VAF. It is mandatory
to discuss complex profiles with variable VAFs in a molecular tumor board before initiating
treatment.
Actionable Variations and Therapeutic Choices
An NGS report would certainly contain genomic or transcriptomic variations that may
be “actionable” or unsuitable with available targeted therapeutic agents. Nevertheless,
all variations are significant from a tumor biology perspective, if not from a clinical
standpoint.
It is now common practice for laboratories to flag U.S. Food and Drug Administration-approved
and off-label therapeutic agents against “actionable” variants. However, the working
group differs in opinion of this practice as it is generating confusion amongst oncologists
and patients in the clinic.
It is therefore advisable to reserve the therapeutic information solely for clinicians
and issue it as a separate document with the report.
Clinical Trials
Certain variants reported may be under active prospective investigation under a registered
randomized controlled trial (RCT). The list of most appropriate RCTs is commonly reported
alongside an NGS report. However, all the listed RCTs are performed in countries other
than India. This information is thus sparingly useful for our patients and is a source
of exasperation.
Within India
The working group recommends clinicians and principal investigators create a common
database or Web site containing information on active and recruiting RCTs within the
country that is accessible to all. This information is practically more valuable than
RCTs in foreign countries.
Outside India
RCTs originating from other countries have secondary value but could be useful for
investigating rare variants and for patients who can access treatment from a different
country.
Allied Variables in an NGS Report
-
TMB is defined by the National Cancer Institute (NCI) as “the total number of mutations (changes) found in the DNA of cancer cells.” Alternatively, it “is a numeric index that expresses the number of mutations per megabase (muts/Mb) harbored
by tumor cells in a neoplasm.” A high TMB is a biomarker of predictive response to immunotherapy and is a good addition
to an NGS report with larger panel size.
-
Microsatellite instability (MSI) is defined by the NCI as “a change that occurs in certain cells (such as cancer cells) in which the number of
repeated DNA bases in a microsatellite (a short, repeated sequence of DNA) is different
from what it was when the microsatellite was inherited.” Tumors harboring high MSI (MSI-H)/deficient mismatch repair (dMMR) are likely to
benefit from immunotherapy. MSI is therefore a valuable biomarker to be added to an
NGS report with a larger panel size.
-
Homologous recombination deficiency (HRD): owing to the complexity of the concept
of HRD, it is crucial to understand the chemistry of homologous recombination repair
pathway and the genes involved in the same. A recent definition of HRD is “a phenotype that is characterized by the inability of a cell to effectively repair
DNA double-strand breaks using the HRR pathway.” It is a crucial biomarker for initiation of PARP inhibitors or platinum-based chemotherapy,
apart from being a prognostic marker for certain cancer types. Clinically, HRD is
now restricted to loss of function of BRCA proteins or BRCA-like or BRCA-ness genotype.
However, concepts of genomic LOH, telomeric allelic imbalance, and large-scale transitions,
a combination used to assess genomic instability/genomic scars, are being utilized
by few laboratories based on the SOLO1 trial, as companion diagnostics. Any laboratory
reporting markers suggestive of HRD is required to perform extensive validation of
scores after appropriate choice of markers. The same must be made available to the
clinician upon request. Any report with a positive HRD status must be presented to
a molecular tumor board or specialist for in-depth analysis of genotype–phenotype
correlation.
Variant of Uncertain Significance/Variant of Unknown Significance/Unclassified Variant
Variant of unknown significance (VUS) is defined by NCI as “A change in a gene's DNA sequence that has an unknown effect on a person's health.” It is important to understand that the effect is clinically uncertain but biologically
certain in most of the cases. A VUS need not be considered for precision oncology
purposes but it must be reported in all cases, and noted in case of germline assays.
A VUS must be revisited by the concerned laboratory every 6 months to check for changes
in classification status. The same, if found, must be intimated to the clinician and
the patient.
Reporting of Whole Blood-Based NGS Assay
Reporting Germline Variants
Usually, a lower depth of coverage is acceptable for germline testing because most
of the variants are either in homozygous or heterozygous. A minimum coverage of 30×
is usually sufficient for germline testing. These reads should be balanced for both
forward and reverse directions. NGS analysis on tissue cannot distinguish between
somatic and germline variants unless paired germline samples are used. While reporting
somatic NGS panel germline mutations should be suspected when VAF is 0.5 to 1.0 keeping
in mind the cellularity of the tumor tissue in the sample. Such patients should be
advised for clinical confirmation with germline samples following genetic counseling
and proper consent. This is more so for genes that are established to be causing hereditary
cancer syndrome and have established guidelines for clinical surveillance such as
BRCA1 or BRCA2 or Lynch syndrome gene variants.
Clinical reports are the end products of germline laboratory testing and therefore,
effective reports are concise, yet easy to understand. Reports should be written in
clear language and should contain all the essential information about the test performed,
including tabulated results, their interpretation, references, methodology, and appropriate
disclaimers. These reporting elements are also covered by CLIA regulations and CAP
laboratory standards for NGS clinical tests. To this end, several guidance documents
and templates have been developed for reporting in accordance with the ACMG laboratory
standards for NGS tests.
The methods and types of variants detected by the assay or genetic test should be
provided in the report. Assay limitations for variant detection should also be noted.
The methods section should include details of nucleic acid capture (e.g., polymerase
chain reaction [PCR], targeted capture, or whole genome amplification) as well as
techniques used to analyze the germline DNA (e.g., bi-directional Sanger sequencing,
NGS, etc.) as this could provide necessary details to the health care provider for
the need to carry out additional follow-up genetic tests. For example, WGS offers
a thorough study of the complete genome, is objective and future-proof, but is more
expensive and analytically challenging. Targeted sequencing, on the other hand, is
less expensive, achieves more sensitivity, and completes the analysis faster, but
it is biased and offers only a limited amount of future-proofing. This applies to
tissue-based assays as well. The laboratory conducting the test may choose to add
a disclaimer that addresses general pitfalls in testing such as sample quality.
Given the rise in the number of variants detected by genetic tests, presenting the
variant and its associated information in a tabular format may be best for conveying
crucial information. These components must include the following but do not have to
follow the given order of presentation: gene name, variant nomenclature at genomic,
cDNA and protein level, exon, zygosity disease (if known in the online mendelian inheritance
in man [OMIM] or ORPHA database), mode of inheritance, and variant classification.
Parental origin could be included if the details are available. Additionally, if specific
variants are being analyzed in genotyping or sequencing tests, the laboratory should
note the variants interrogated with their full description, historical nomenclature,
and family history context if available.
The interpretation should contain the evidence supporting the variant classification
according to the ACMG-AMP classification system, which would stratify variants into
one of the five categories: pathogenic, likely pathogenic, variant of uncertain significance,
likely benign, and benign. It is imperative to state whether the identified variants
are likely to explain the patient's phenotypes fully or partially. The interpretation
section should provide details of all variants described in the results section but
may contain additional information such as whether the variant has previously been
reported in the literature, present in disease or control databases, and minor allele
frequency in healthy population databases. The additional information described in
the interpretation section could include a summary of the results of in silico analyses
and evolutionary conservation analyses. A discussion of decreased penetrance and variable
expressivity of the disorder, if relevant and available, should be included in the
final report. The report should also include any recommendations for clinicians for
supplemental clinical testing and variant testing of other family members for segregation
analysis, mode of inheritance, and variant re-classification. The references, if any,
that contributed to the classification should be cited where discussed and listed
at the end of the report.
Technical Aspects
For somatic panels, the aim for optimal DOS depends on the limit of detection (LOD)
or the sensitivity of the assay that we aim for. Analytical validation using standard
reference materials from Seracare/Horizon discovery/National Institute of Standards
and Technology/Coriell Institute as well as proficiency testing material from CAP
or European Molecular Genetics Quality Network are some options available to standardize
an assay for its LOD or sensitivity. Once the LOD is established by analytical validation,
the same needs to be reproduced with clinical validation using patient samples. Serial
dilution of DNA or RNA, followed by library preparation of these serial dilutions,
is one of the standard approaches used to derive the clinical sensitivity of the NGS
assay for somatic variant calling.
-
Overall coverage: overall coverage of panel is measured as a mean coverage throughout
the genomic region sequenced as part of the targeted panel. This is often measured
at different depths starting from 1X, 10X, 50X, and 100X for somatic variant detection.
As per the AMP/CAP, a minimum of 200X mean coverage depth is recommended to achieve
a LOD of 5% for somatic variant calling.
Bioinformatic Pipelines
NGS requires extensive bioinformatic support for generation of a report. Validation
and standardization of the dry laboratory segment is as important as the wet laboratory
segment. The GATK pipeline developed by Broad Institute is one of the standard methods
used in the clinical setting. Assessing the sequencing data's quality before analysis
is crucial to guarantee accurate results. Sequencing data quality can be assessed
using QC measures like Phred scores, read length, and base quality scores. FastQC
and QualiMap are two tools for QC that are often utilized. After a basic assessment
of the NGS data to filter out good quality reads that have a Phred score above 30,
the raw reads are subjected to sequence alignment or mapping to the reference genome
(healthy individual's genome).
Phred Quality score (Q score): it is a quality indicator used with sequencing by synthesis
NGS chemistry. The matrix is a reflection of the accuracy of the base called by the
platform.
Calculation: Q = − 10 log10 P, where P is the probability of error in base calling.
A Q score of 30 is ideal where the probability of a wrong base call is 1 in 1,000
and the accuracy of a base call is 99.9%.
Alignment scoring metrics that allow the calculation of true SNVs and INDELs as part
of multiple sequence alignment using tools like BWA, Bowtie2, or STAR to the reference
genome while generating the BAM and SAM files are an important step that needs to
be thoroughly validated and verified. Following read alignment, the next step is to
search the aligned reads for genetic variants like SNPs or insertions/deletions (indels).
By comparing the aligned reads to the reference genome, variations can be found using
software for variant calling like GATK, but other tools, like VarScan and Strelka2,
are also used to verify GATK's results. The step's output is a VCF file containing
the discovered variations. The next step after variant calling is to annotate the
variants to ascertain their clinical relevance. This step entails determining if the
discovered variations are known to be pathogenic, benign, or of unknown importance
by comparing them to databases like ClinVar, COSMIC, or dbSNP. This stage can be completed
using annotation software like ANNOVAR, SnpEff, or VEP. The variations must then be
annotated before the results are finally interpreted, and a clinical report is produced.
Assessing the importance of the discovered variations may entail analyzing the patient's
clinical background and other pertinent data. A clinical report can be produced using
reporting software like Ingenuity, Varsome, or Opal Clinical. Report generation and
interpretation are extensively discussed below. It is crucial to remember that this
is a condensed overview of the data analysis portion of a clinical NGS pipeline and
that the precise tools and methodologies employed can change depending on the laboratory
and sequencing platform. Many pipelines will also incorporate extra phases like QC
filtering, rare variant filtering, and variant pathogenicity evaluation.
Validation
Validation is one of the important aspects of NGS clinical test before it is implemented
for routine clinical practice in the clinic. The assay validation needs to be addressed
as per the recommendations of ISO15189 (for medical laboratories), which aligns with
CAP and National Accreditation Board for Laboratory Testing (NABL) in India.
Optimization and Familiarization Process
The choice of sample type and number of samples in each category are important parameters
to decide before we initiate any validation. The scope of the test determines the
choice of samples and the genomic alterations to be verified in the validation samples.
A minimum of 20 clinical samples (unique clinical data points) is a requirement to
address the clinical validation. In any general validation study, the robustness or
familiarization of the assay is the first step to verify the reagents and consumables
and their performance. The minimum amount of nucleic acid material required to get
the desired result (true positivity) is also established here. Establishing clinical
accuracy of the testing results is an important step following the robustness. This
is achieved by inter-laboratory comparison of results from clinical specimens and
is often processed in collaboration with a CAP or NABL-certified laboratory in India.
Beyond this, one needs to establish the analytical and clinical specificity, sensitivity/LOD,
repeatability, and reproducibility of the testing.
As part of this validation, one also needs to demonstrate the reproducibility and
accuracy of the testing with inter-run, intra-run, as well as inter-individual (testing
personnel) analysis.
CAP provides data specific to different sequencing technology platforms (by Illumina
or Thermo Fischer Scientific). This blinded survey helps in the assessment of bioinformatics
pipelines/workflow, which varies across multiple laboratories, to assess the performance
of some of the important factors: (1) basic QC that ensures only good-quality sequencing
data are only considered for further downstream analysis; (2) variant calling for
both somatic and germline workflows.
Basic Assay Validation
Platform
There are two major technology platforms for massively parallel sequencing of DNA
and RNA, namely sequencing by synthesis (SBS by Illumina), and ion torrent semiconductor
sequencing (by Thermo Fisher Scientific).
Method (Amplicon-Based/Hybrid Capture)
Targeted sequencing of the region of our interest could be achieved by two methods:
(1) PCR-amplicon-based approach and (2) hybrid capture-based approach.
For hotspot panels, where the regions of interest are predetermined, the amplicon-based
approach is a scalable and cheap option. This is one of the early methods of choice
for NGS panels in oncology, which had seen great success rates from all types of FFPE
DNA (ranging from 70 bp [heavily degraded sample] to 2,000 bp [average size of a fragment
from a good quality processed FFPE tissue block]).
Critical factors influencing the success of variant detection from amplicon sequencing:
it is always ideal to have the primers designed such that the variant of interest
is in the middle of the PCR product. Multiplex PCR step is the heart of the amplicon-based
approach that is critical to any successful validation. Primer design plays an important
role, and nonoverlapping PCR products/staggered design is one of the strategies adopted
to ensure complete coverage of the region of interest if the panel is aimed at sequencing
complete CDS or complete exons in case of hotspots. This factor is measured, in amplicon-based
panels, by verifying the uniformity in PCR amplification across primer pairs in the
multiplex PCR. The choice of enzyme, buffers, its molarity, and additives used in
multiplex PCR master mix determines the yield of PCR. When the panel size increases
beyond 1 Mb, although technically amplicon-based approach could be feasible, it may
not be an economically viable option as compared with the probe-based approach.
In the probe-based approach, the template DNA is PCR amplified to increase the concentration
of template copies. Following this, the genes of interest regions are captured by
hybridization assay. The captured template copies are further PCR-amplified and subjected
to sequencing. Unlike in the amplicon-based method, where we could have a wide range
of PCR amplicons, in the probe-based approach, the template remains untouched, after
the initial fragmentation step. All the available commercial probe design algorithms
in general range from 60 to 120 bp. Post-capture, the template fragments are further
enriched with PCR step before subjecting the libraries to NGS.
Probe-based chemistry could be a challenge in poor-quality FFPE specimens. This could
be an adduct formation in the tumor DNA that leads to an improper binding or presence
of any other impurities that may impact the hybridization process.
Description in the Final Validation Report
The brief validation process or peer-reviewed publication containing the same must
be included separately toward the end of every NGS report.
Raw Data Storage, Consent for Reuse, Traceability
Storage of sensitive data, and patient consent for utilization of the data generated
outside the purview of the primary indication, is an important concern to be addressed
by the stakeholders and end users of the technology. We recommend that the laboratory
and health care facility adhere to clauses provided in the Digital Personal Data Protection
Act 2023 published as the Gazette of India CG-DL-E-12082023–248045 under the Ministry
of Law and Justice: (https://www.meity.gov.in/writereaddata/files/Digital%20Personal%20Data%20Protection%20Act%202023.pdf).
The benefits of universal, de-identified genetic data sharing promote the exchange
of information, research, and identification of specific mutations/alternations in
different ethnic groups. Such information in a centralized manner and internationally
could help identify new pathogenic genetic alterations/targets for future drug research.
This will also influence daily practice in the community.
The issues with such data storage such as identifier anonymization, consent of the
patient/carrier, quality of stored data, data storage and transfer overseeing authorities,
encrypted access to data, and the logistics are to be addressed by a national body,
as it is outside the purview of our guideline.