Motivation: Recent studies have demonstrated widespread adenosine-inosine RNA editing in non-coding sequence. However, the extent of editing in coding sequences has remained unknown. For many of the known sites, editing can be observed in multiple species and often occurs in well-conserved sequences. In addition, they often occur within imperfect inverted repeats and in clusters. Here we present a bioinformatic approach to identify novel sites based on these shared features. Mismatches between genomic and expressed sequences were filtered to remove the main sources of false positives, and then prioritized based on these features. This protocol is tailored to identifying specific recoding editing sites, rather than sites in non-coding repeat sequences.
Results: Our protocol is more sensitive for identifying known coding editing sites than any previously published mammalian screen. A novel multiply edited transcript, BC10, was identified and experimentally verified. BC10 is highly conserved across a range of metazoa and has been implicated in two forms of cancer.