Background: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments.
Results: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.
Conclusions: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html .