With an annual estimated incidence of 1.4 million, and a five-year survival rate of 60%, colorectal cancer (CRC) is a major clinical burden. To identify novel RNA variants in CRC, we analyzed exon-level microarray expression data from a cohort of 202 CRCs. We nominated 25 genes with increased expression of their 3' parts in at least one cancer sample each. To efficiently investigate underlying transcript structures, we developed an approach using rapid amplification of cDNA ends followed by high throughput sequencing (RACE-seq). RACE products from the targeted genes in 23 CRC samples were pooled together and sequenced. We identified VWA2-TCF7L2, DHX35-BPIFA2 and CASZ1-MASP2 as private fusion events, and novel transcript structures for 17 of the 23 other candidate genes. The high-throughput approach facilitated identification of CRC specific RNA variants. These include a recurrent read-through fusion transcript between KLK8 and KLK7, and a splice variant of S100A2. Both of these were overrepresented in CRC tissue and cell lines from external RNA-seq datasets.
Keywords: RACE-seq; colorectal cancer; fusion genes; splicing; transcript variants.