A pattern-based nearest neighbor search approach for promoter prediction using DNA structural profiles

Bioinformatics. 2009 Aug 15;25(16):2006-12. doi: 10.1093/bioinformatics/btp359. Epub 2009 Jun 10.

Abstract

Motivation: Identification of core promoters is a key clue in understanding gene regulations. However, due to the diverse nature of promoter sequences, the accuracy of existing prediction approaches for non-CpG island (simply CGI)-related promoters is not as high as that for CGI-related promoters. This consequently leads to a low genome-wide promoter prediction accuracy.

Results: In this article, we first systematically analyze the similarities and differences between the two types of promoters (CGI- and non-CGI-related) from a novel structural perspective, and then devise a unified framework, called PNNP (Pattern-based Nearest Neighbor search for Promoter), to predict both CGI- and non-CGI-related promoters based on their structural features. Our comparative analysis on the structural characteristics of promoters reveals two interesting facts: (i) the structural values of CGI- and non-CGI-related promoters are quite different, but they exhibit nearly similar structural patterns; (ii) the structural patterns of promoters are obviously different from that of non-promoter sequences though the sequences have almost similar structural values. Extensive experiments demonstrate that the proposed PNNP approach is effective in capturing the structural patterns of promoters, and can significantly improve genome-wide performance of promoters prediction, especially non-CGI-related promoters prediction.

Availability: The implementation of the program PNNP is available at http://admis.tongji.edu.cn/Projects/pnnp.aspx.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods*
  • CpG Islands
  • DNA / chemistry*
  • Promoter Regions, Genetic / genetics*
  • Sequence Analysis, DNA / methods

Substances

  • DNA