PRESTA: associating promoter sequences with information on gene expression

Genome Biol. 2002 Aug 21;3(9):research0050. doi: 10.1186/gb-2002-3-9-research0050. Epub 2002 Aug 21.

Abstract

Background: Large sets of well-characterized promoter sequences are required to facilitate the understanding of promoter architecture. The major sequence databases are a prospective source of upstream regulatory regions, but suffer from inaccurate annotation. The software tool PRESTA (PRomoter EST Association) presented in this study is designed for efficient recovery of characterized and partially verified promoters from GenBank and EMBL libraries.

Results: The PRESTA algorithm examines the putative GenBank/EMBL promoters and automatically removes most of the poorly annotated entries. The remaining records are connected to expressed sequence tags (ESTs) through a high-stringency BLAST search. The frequency and source of recovered ESTs provide an estimate of the activity and expression pattern of the promoter, and the ESTs' 5' ends assist in transcription start-site verification. The PRESTA database provides easy access to non-redundant upstream regulatory regions recently extracted by the PRESTA algorithm. The current size of this resource is 552 human and 241 mouse promoters. Surprisingly, no overlap between the PRESTA database and the Eukaryotic Promoter Database (EPD) was detected by sequence comparison.

Conclusions: The PRESTA algorithm demonstrates the principle of promoter verification by mapping EST 5' ends. The publicly available PRESTA database collects hundreds of characterized and partially verified promoter sequences and is complementary to other promoter databases.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods
  • Databases, Nucleic Acid
  • Expressed Sequence Tags
  • Gene Expression / genetics*
  • Gene Expression Profiling / methods
  • Humans
  • Information Storage and Retrieval / methods
  • Mice
  • Nucleic Acid Conformation
  • Online Systems
  • Promoter Regions, Genetic / genetics*
  • Software*