Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing

Nat Biotechnol. 2004 Aug;22(8):1006-11. doi: 10.1038/nbt992. Epub 2004 Jul 11.


Large-scale sequencing of short mRNA-derived tags can establish the qualitative and quantitative characteristics of a complex transcriptome. We sequenced 12,304,362 tags from five diverse libraries of Arabidopsis thaliana using massively parallel signature sequencing (MPSS). A total of 48,572 distinct signatures, each representing a different transcript, were expressed at significant levels. These signatures were compared to the annotation of the A. thaliana genomic sequence; in the five libraries, this comparison yielded between 17,353 and 18,361 genes with sense expression, and between 5,487 and 8,729 genes with antisense expression. An additional 6,691 MPSS signatures mapped to unannotated regions of the genome. Expression was demonstrated for 1,168 genes for which expression data were previously unknown. Alternative polyadenylation was observed for more than 25% of A. thaliana genes transcribed in these libraries. The MPSS expression data suggest that the A. thaliana transcriptome is complex and contains many as-yet uncharacterized variants of normal coding transcripts.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis / metabolism*
  • Arabidopsis Proteins / genetics*
  • Arabidopsis Proteins / metabolism*
  • Computing Methodologies
  • Expressed Sequence Tags
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Plant / genetics
  • Genome, Plant
  • Peptide Library
  • Sequence Alignment / methods*
  • Sequence Analysis, RNA / methods*
  • Transcription, Genetic / genetics*


  • Arabidopsis Proteins
  • Peptide Library