Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

BMC Bioinformatics. 2015 Feb 1:16:32. doi: 10.1186/s12859-015-0470-y.

Abstract

Background: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments.

Results: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy.

Conclusions: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Binding Sites
  • HEK293 Cells
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Immunoprecipitation
  • MicroRNAs / chemistry
  • MicroRNAs / metabolism*
  • Models, Statistical*
  • RNA / chemistry
  • RNA / metabolism*
  • RNA-Binding Proteins / metabolism*
  • Sequence Analysis, RNA / methods*
  • Transcriptome

Substances

  • MicroRNAs
  • RNA-Binding Proteins
  • RNA