CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Genome Biol. 2018 Nov 28;19(1):208. doi: 10.1186/s13059-018-1590-2.

Abstract

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

Keywords: GTEx; Human gene count; RNA sequencing; Transcriptome; Transcriptome assembly.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Databases, Genetic*
  • Female
  • Humans
  • Introns
  • Male
  • Sequence Analysis, RNA*
  • Transcription, Genetic*