In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy

Jin Zhang; Wenqing Zhang; Huijie Yang

doi:10.1007/s10867-015-9399-7

In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy

J Biol Phys. 2016 Jan;42(1):99-106. doi: 10.1007/s10867-015-9399-7. Epub 2015 Aug 29.

Authors

Jin Zhang^{1

2}, Wenqing Zhang³, Huijie Yang³

Affiliations

¹ Business School, University of Shanghai for Science and Technology, Shanghai, 200093, China. zdypaper@163.com.
² School of Information Science and Engineering, University of Jinan, Jinan, 250022, China. zdypaper@163.com.
³ Business School, University of Shanghai for Science and Technology, Shanghai, 200093, China.

Abstract

Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

Keywords: BEDE; Coding regions; Diffusion entropy; Non-coding regions; Self-similar structure; Time series.

MeSH terms

Base Sequence
DNA, Fungal / genetics*
Diffusion
Entropy*
Models, Genetic*

Substances

DNA, Fungal