Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Jan 24:4:11.
doi: 10.1186/1755-8794-4-11.

Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

Affiliations
Free PMC article
Comparative Study

Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

Serban Nacu et al. BMC Med Genomics. .
Free PMC article

Abstract

Background: Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome.

Methods: We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays.

Results: Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%.We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines.

Conclusions: Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Complex isoforms observed in transcription-induced chimeras. TIC splicing events are shown by dashed arrows, labeled with splice distance and samples or ESTs with supporting alignments. Standard splicing is shown by solid lines. (A) Multiple isoforms observed for PLEKHO2-ANKDD1A TIC in the human brain reference (HBR) and universal human reference (UHR) samples. (B) Direct TIC splicing and TICs with multiple forms of intervening exons (labeled IE) for VAMP8-VAMP5, all observed in a single prostate sample N1. Shaded box represents an intervening exon found previously [5], but not in this study. (C) TIC with an intergenic exon between ARMCX5 and GPRASP2, all observed in N1.
Figure 2
Figure 2
Characteristics of TIC events. (A) Comparison of TIC gene pairs found in previous EST-based surveys and those found by RNA-Seq in this study. (B) Distribution of TIC events across tissues. Only TIC events with multiple supporting reads are included. HBR = human brain reference, UHR = universal human reference. (C) Coding potential of TIC events. The label "Full CDS" indicates that the coding region (CDS) extends from the original transcription start site (TSS) of the 5' gene and to the original stop codon of the 3' gene; "3' shift" signifies a frameshift in the 3' gene; "New TSS" indicates that the TIC breakpoint occurs before the original TSS of the 5' gene and a new TSS is predicted from the longest open reading frame; "TLE" indicates that termination occurs in the last exon of the transcript; and "PTC" indicates premature termination codon, subjecting the transcript to nonsense-mediated decay. (D) Distribution of TIC splice distances. (E) Distribution of splice distances in the artificial exon-exon junctions. (F) Predicted effect on domains. Separate results are presented for TICs having a PTC, or having a TLE despite a new TSS or 3' frameshift, or having a full CDS. Each pair of bars show the effect on the 5' (left) and 3' (right) domains. "ND" indicates that no domain was originally present in the 5' or 3' gene; "Null" indicates no intersection of the predicted TIC domains with the original domains; "Subset" indicates that at least one, but not all domains were preserved in the TIC; and "Cover" indicates all domains were preserved. (G) Distribution of expression levels in 5' genes with observed TICs downstream compared to those without. (H) Distribution of expression levels in 3' genes with observed TICs compared to those without. For panels F and G, distributions are taken over genes with at least one observed intragenic splice in a given sample and with a potential TIC exon within 200,000 bp in the downstream or upstream direction, respectively.
Figure 3
Figure 3
Expression of TICs and their component genes. (A-F) Each panel contains expression data for a TIC and its component genes, and is labeled with the splice distance. The leftmost plot in each panel shows the expression of the TIC splice using qRT-PCR measurements relative to GAPDH in prostate tumor samples T1-T3, matched normal prostate samples N1-N3, and a commercial sample of normal prostate (C). Error bars indicate the standard error over 2 replicate measurements. The rightmost plots in each panel show expression of the 5' and 3' genes for the T1-T3 and N1-N3 samples, as measured by RNA-Seq in reads per kilobase per million total reads (RPKM). TICs are presented from panel A to panel F in order of increasing TIC splice expression. For panels A-C, expression of the 5' and 3' genes are plotted on the same scale. For panels D-F, because expression of the 3' gene is extremely low relative to that of the 5' gene, expression of each 3' gene is plotted on its own scale. Panel F for MSMB-NCOA4 has the greatest variance of expression values across samples and shows that TIC splice expression correlates with 5' gene expression, but not 3' gene expression. (G) Relationship between TIC and 5' gene expression, shown as a scatterplot. (H) TIC splicing efficiency, computed as TIC splice expression divided by the 5' gene expression, for each sample. In panels G and H, plot symbols A-F correspond to the TICs labeled in panels A-F.
Figure 4
Figure 4
Expression patterns of the SLC45A3-ELK4 e4-e2 TIC and related genes. (A) qRT-PCR levels of SLC45A3-ELK4 e4-e2 TIC in the sequenced prostate tumor and normal sample pairs T1/N1, T2/N2, and T3/N3 pairs (labeled as 1-3, and marked with "-" for ERG-negative and "+" for ERG-positive status), plus panels of 6 ERG-negative and 14 ERG-positive prostate tumor and normal matched samples, a commercial sample of prostate normal RNA, and 9 lung cancer cell lines. (B) Microarray-based expression profile of SLC45A3 (Affymetrix probe 228696-at on GeneChip HG-U133B) across human tissues, showing prostate specificity. Samples are organized by tissue, with normal samples above (green) and cancer samples below (red). (C) Microarray-based expression profile of ELK4 (Affymetrix probe 206919-at on GeneChip HG-U133A). (D) Relationship of SLC45A3 and ERG (Affymetrix probe set 241926-s-at on GeneChip HG-U133B) expression levels in prostate tumor and normal samples, showing that highest expression of SLC45A3 is restricted to samples with low expression of ERG.
Figure 5
Figure 5
Expression profiles for 5' and 3' genes in prostate-specific TICs. Expression profiles for (A) MSMB, (B) NCOA4, (C) AZGP1, and (D) GJC3. Panels A and C represent the 5' genes of TIC events, while B and D represent 3' genes. Affymetrix probe sets used are 207430js-at, 210774-s-at, 209309-at, and 215060-at, respectively. Data are taken from the GeneLogic (Gaithersburg, MD) database. Expression levels are indicated by position along the x axis. Samples are grouped by tissue of origin. Samples in red represent cancer samples, and those in green represent normal samples.
Figure 6
Figure 6
Tissue specificity of 5' genes in observed TICs. Heatmap of gene expression across a panel of normal tissues for the 5' genes corresponding to all observed TICs. Data are taken from the GeneLogic (Gaithersburg, MD) database. Each bar in the heatmap represents the mean expression of the 5' gene in the given tissue. Expression is scaled within each gene to have uniform standard deviation over all genes, and then plotted using its logarithmic value, further transformed by the normal distribution function to achieve a bounded range of colors. Gene expression level is indicated by color, with red indicating increased expression, and green indicating decreased expression.
Figure 7
Figure 7
Distant fusions. (A) Expression level of TMPRSS2-ERG e1-e4 and e1-e5 fusion splices in prostate tumor and normal samples measured by qRT-PCR, compared with ERG expression as measured by RNA-Seq. qRT-PCR measurements are shown for prostate tumor samples T1-T3, matched normal prostate samples N1-N3, and a commercial sample of normal prostate. RNA-Seq measurements are shown fir T1-T3 and N1-N3. (B) Comparison of SEC31A-C6orf62 expression level with downstream C6orf62 expression. Fusion is observed only in the T3 sample. (C) Comparison of IRS2-NUFIP1 expression level with downstream NUFIP1 expression. Fusion is observed only in the T2 sample. (D) CGH microarray data for chromosome 21, containing the TMPRSS2-ERG fusion. A corresponding genomic deletion is observed in the T3 sample, but not in T2, indicating that the gene fusion in T2 is due to translocation. (E) CGH microarray data for chromosome 13, containing the IRS2-NUFIP1 fusion. No corresponding genomic deletions are observed.

Similar articles

Cited by

References

    1. Magrangeas F, Pitiot G, Dubois S, Bragado-Nilsson E, Cheérel M, Lebeau SJB, Boisteau O, Lethé B, Mallet J, Jacques Y, Minvielle S. Cotranscription and intergenic splicing of human galactose-1-phosphate uridylyltransferase and interleukin-11 receptor α-chain genes generate a fusion mRNA in normal cells. Journal of Biological Chemistry. 1998;273:16005–16010. - PubMed
    1. Thomson TM, Lozano JJ, Loukili N, Carrió R, Serras F, Courmand B, Valeri M, Diaz VM, Abril J, Burset M, Merino J, Macaya A, Corominas M, Guigó R. Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene. Genome Research. 2000;10:1743–1756. - PMC - PubMed
    1. Communi D, Suarez-Huerta N, Dussossoy D, Savi P, Boeynaems JM. Cotranscription and intergenic splicing of human P2Y11 and SSF1 genes. Journal of Biological Chemistry. 2001;276:16561–16566. - PubMed
    1. Pradet-Balade B, Medema JP, Lopez-Fraga M, Lozano JC, Kolfschoten GM, Picard A, Martinez AC, Garcia-Sanz JA, Hahne M. An endogenous hybrid mRNA encodes TWE-PRIL, a functional cell surface TWEAK-APRIL fusion protein. The EMBO Journal. 2002;21:5711–5720. - PMC - PubMed
    1. Akiva P, Toporik A, Edelheit S, Peretz Y, Diber A, Shemesh R, Novik A, Sorek R. Transcription-mediated gene fusion in the human genome. Genome Research. 2006;16:30–36. - PMC - PubMed

Publication types

Associated data