Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Dec 31;11(6):361-70.
doi: 10.1093/dnares/11.6.361.

Gene Recognition Based on Nucleotide Distribution of ORFs in a Hyper-Thermophilic Crenarchaeon, Aeropyrum Pernix K1

Affiliations
Free article

Gene Recognition Based on Nucleotide Distribution of ORFs in a Hyper-Thermophilic Crenarchaeon, Aeropyrum Pernix K1

Feng-Biao Guo et al. DNA Res. .
Free article

Abstract

The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assigned to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff. Annotation information of re-recognized genes and their AZ scores are available at: http://tubic.tju.edu.cn/Aper/.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback