In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy

J Biol Phys. 2016 Jan;42(1):99-106. doi: 10.1007/s10867-015-9399-7. Epub 2015 Aug 29.

Abstract

Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

Keywords: BEDE; Coding regions; Diffusion entropy; Non-coding regions; Self-similar structure; Time series.

MeSH terms

  • Base Sequence
  • DNA, Fungal / genetics*
  • Diffusion
  • Entropy*
  • Models, Genetic*

Substances

  • DNA, Fungal