Background: Single Base Substitutions (SBS) that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells.
Methods: We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE) and Tag-seq (a combination of L-SAGE and deep sequencing), and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT), i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality.
Results: In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP), catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST), i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC), healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression.
Conclusion: If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient's tumor and aid diagnostic.