Reliable Identification of Genomic Variants From RNA-seq Data

Am J Hum Genet. 2013 Oct 3;93(4):641-51. doi: 10.1016/j.ajhg.2013.08.008. Epub 2013 Sep 26.

Abstract

Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome. Here, we present a highly accurate approach termed SNPiR to identify SNPs in RNA-seq data. We applied SNPiR to RNA-seq data of samples for which WGS and WES data are also available and achieved high specificity and sensitivity. Of the SNPs called from the RNA-seq data, >98% were also identified by WGS or WES. Over 70% of all expressed coding variants were identified from RNA-seq, and comparable numbers of exonic variants were identified in RNA-seq and WES. Despite our method's limitation in detecting variants in expressed regions only, our results demonstrate that SNPiR outperforms current state-of-the-art approaches for variant detection from RNA-seq data and offers a cost-effective and reliable alternative for SNP discovery.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exome*
  • Exons
  • Genome, Human*
  • Genomics / methods*
  • Humans
  • Open Reading Frames
  • Polymorphism, Single Nucleotide*
  • RNA / genetics*
  • Sensitivity and Specificity
  • Sequence Analysis, RNA / methods*

Substances

  • RNA