Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Jan;12(1):198-202.
doi: 10.1101/gr.200901.

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study

Affiliations
Comparative Study

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study

Anton Nekrutenko et al. Genome Res. 2002 Jan.

Abstract

Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Distribution of exon lengths. Exons were stratified into six length classes. For example, the 100-bp class contains exons with lengths ranging from 75 to 125 bp. The white area of each bar represents the number of exons that show KA/KS significantly smaller than 1 (passing the test), whereas the shaded area corresponds to the number of exons that have KA/KS not statistically different from 1 (failing the test). The numbers above each bar indicates the ratio of the number of exons in the shaded area to the number of exons in the white area. For example, the group with the mean length 50 bp contains 81 exons; 34 of them did not pass the KA/KS test. (B) Relationship between human–mouse sequence divergence and the number of false negatives (exons that fail the test). Bars represent the proportions of exons in each of the five divergence classes. Points on the curve indicate the proportion of false negatives within each identity class. For example, ∼35% of exons in our dataset belong to a class in which divergence ranges from 10% to 15%. Within this class ,∼4% exons did not pass the KA/KS test.

Similar articles

Cited by

References

    1. Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 2000;10:950–958. - PMC - PubMed
    1. Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 2000;10:1304–1306. - PMC - PubMed
    1. Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. - PubMed
    1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–919. - PubMed
    1. Jareborg N, Birney E, Durbin R. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 1999;9:815–824. - PMC - PubMed

Publication types

LinkOut - more resources