The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study
- PMID: 11779845
- PMCID: PMC155263
- DOI: 10.1101/gr.200901
The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study
Abstract
Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.
Figures
Similar articles
-
Improving the specificity of exon prediction using comparative genomics.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13. BMC Genomics. 2008. PMID: 18831778 Free PMC article.
-
Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis.Nucleic Acids Res. 2003 Aug 1;31(15):4639-45. doi: 10.1093/nar/gkg483. Nucleic Acids Res. 2003. PMID: 12888525 Free PMC article.
-
Comparative genomics as a tool for gene discovery.Curr Opin Biotechnol. 2006 Apr;17(2):161-7. doi: 10.1016/j.copbio.2006.01.007. Epub 2006 Feb 3. Curr Opin Biotechnol. 2006. PMID: 16459073 Review.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Computational prediction of eukaryotic protein-coding genes.Nat Rev Genet. 2002 Sep;3(9):698-709. doi: 10.1038/nrg890. Nat Rev Genet. 2002. PMID: 12209144 Review.
Cited by
-
Genome-Wide Analysis of the Liriodendron chinense Hsf Gene Family under Abiotic Stress and Characterization of the LcHsfA2a Gene.Int J Mol Sci. 2024 Feb 27;25(5):2733. doi: 10.3390/ijms25052733. Int J Mol Sci. 2024. PMID: 38473982 Free PMC article.
-
Genome-wide identification and expression analysis of xyloglucan endotransglucosylase/hydrolase genes family in Salicaceae during grafting.BMC Genomics. 2023 Nov 9;24(1):676. doi: 10.1186/s12864-023-09762-y. BMC Genomics. 2023. PMID: 37946112 Free PMC article.
-
The Late Embryogenesis Abundant Proteins in Soybean: Identification, Expression Analysis, and the Roles of GmLEA4_19 in Drought Stress.Int J Mol Sci. 2023 Oct 2;24(19):14834. doi: 10.3390/ijms241914834. Int J Mol Sci. 2023. PMID: 37834282 Free PMC article.
-
Genome-wide identification and comprehensive analysis of WRKY transcription factor family in safflower during drought stress.Sci Rep. 2023 Oct 7;13(1):16955. doi: 10.1038/s41598-023-44340-y. Sci Rep. 2023. PMID: 37805641 Free PMC article.
-
Genome-Wide Identification and Expression Analysis of the PLATZ Transcription Factor in Tomato.Plants (Basel). 2023 Jul 13;12(14):2632. doi: 10.3390/plants12142632. Plants (Basel). 2023. PMID: 37514247 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources