ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms

Nat Commun. 2020 May 11;11(1):2326. doi: 10.1038/s41467-020-16174-z.

Abstract

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Open Reading Frames / genetics*
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Reference Standards
  • Sequence Analysis, RNA / methods*
  • Transcription Factors / genetics

Substances

  • Protein Isoforms
  • RNA, Messenger
  • Transcription Factors