RBF-TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies

PLoS One. 2009;4(3):e4878. doi: 10.1371/journal.pone.0004878. Epub 2009 Mar 16.

Abstract

Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Humans
  • Oligonucleotides / genetics*
  • Transcription, Genetic*

Substances

  • Oligonucleotides