The How's, What's, and Why's of Gene Expression Assay Design
by Patricia Hegerich, Sr. Product Applications Specialist - 05/09/12
So how are gene expression assays designed and why aren't all Life Technologies gene expression assays exon spanning?
Whenever possible, our assays are designed to have the probe span and exon junction. Before assays are designed, sequences undergo some preprocessing, which includes masking repetitive & low complexity regions. [These types of sequences can be found in many places in the genome, and if included, would take away from the specificity of the assay.]
Next, gene structure is annotated by mapping the masked transcripts to the genome assembly. That’s how we know where exon-exon boundaries are for multi-exon transcripts.
Finally, all known single nucleotide polymorphisms (SNPs) are masked. For multi-exon genes, an assay target position is selected at each exon junction. We chose to place the probe, rather than one of the primers, over the exon-exon boundary to ensure that the primers bind in two distinct exons and the probe straddles the exon junction.
All assays designed over exon junctions are designated with the “_m1” suffix. For single exon genes, we are limited to placing both the probe and the primers within the exon. These assays are designated with the “_s1” suffix.
Along with specificity of the assay for the transcript of interest, it is also important to determine specificity of an assay versus genomic DNA (gDNA).
After assays are designed, they go through an in silico quality control (QC) process. This process penalizes, and thus screens out:
1) Assay designs that are not highly specific for the gene of interest
2) Assay designs that may not accurately report the quantitative expression results for a particular target.
The three parts of the in silico QC process include a transcript BLAST, genome BLAST and Intron Size determination. The transcript BLAST determines the degree of homology between the assay and other closely-related transcripts. The genome BLAST determines the degree of homology between the assay and non-self regions of the genomic DNA (i.e., homologous genes and pseudogenes). The intron size screening determines the size of the intron across which a probe spans (for multi-exon genes). Some genes have a functional transcript as well as one or more intronless (processed) pseudogenes.
Instead of having no assay at all for these genes, we have chosen to release these assays with a suffix of “_g1”, if an assay indicates the potential to give a positive signal with an RNA template contaminated with gDNA. Assays with a _g1 suffix can be designed over an exon junction or within a single exon.
This is the assay naming system we put in place when our collection of TaqMan gene expression assays was released in 2004. The assay names (IDs) were given as a unique identifier to each assay. The assay ID (i.e., Hs99999901_s1) never changes and always corresponds to the same probe and primer sequences.
Because these are unique identifiers, they cannot change. We remap our assays about once a year, in order to keep current with the latest annotation of transcripts and genomic sequences released to the public through NCBI. This can present a problem when new transcripts and genomic sequences are released.
Annotation can change over time, as new information is released. What this means for our assays is that, because the assay sequences stay the same, the meaning of the assay ID may shift. Remember that these assay IDs are assigned at the time the assays are designed; and therefore, describe the assay design (single exon or multi-exon assay). But the underlying transcript sequence or annotated genomic may change.
What that means is that while a _m1 assay was designed with the probe across an exon junction, it’s possible that the transcript sequences used to design that assay or genomic sequence, has changed over time. An assay that was once a _m1 assay, may now detect genomic DNA.
We have added important information to our assay information provided on the web in order to keep the information of an assay up to date with the latest sequences released.
Assay ID suffixes
Below is a quick look at the meaning of the assay ID suffix, which indicates the assay design:
"_m" Indicates an assay whose probe spans an exon junction.
"_s" Indicates an assay whose primers and probes are designed within a single exon, such assays will, by definition detect genomic DNA
"_g" Indicates an assay that will detect off target genomic DNA. The assay may either have a probe that spans an exon junction or the primers and probe may also be within a single exon.
"_mH," "_sH", or "_gH" indicates that the assay was designed to a transcript belonging to a gene family with high sequence homology. The assays have been designed to give between 10 Ct and 15Ct difference between the target gene and the gene with the closest sequence homology. This means that an assay will detect the target transcript with 1,000-30,000-fold greater discrimination (sensitivity) than the closest homologous transcript, if they are present at the same copy number in a sample.
"_u" Indicates an assay whose amplicon spans an exon junction and the probe sits completely in one of the spanned exons.
"_ft" Indicates an assay designed to detect fusion transcripts that result from chromosomal translocation. One primer and the probe are located on one side of the fusion transcript breakpoint, and the second primer is located on the other side of the fusion transcript breakpoint. The assay will not detect gDNA.
"_at" Indicates an assay that is designed to detect a specific synthetic RNA transcript with a unique sequence that lacks homology to current annotated biological sequences.