Combined use of sequence similarity and codon bias for coding region identification

D J States; W Gish

doi:10.1089/cmb.1994.1.39

Combined use of sequence similarity and codon bias for coding region identification

J Comput Biol. 1994 Spring;1(1):39-50. doi: 10.1089/cmb.1994.1.39.

Authors

D J States¹, W Gish

Affiliation

¹ Institute for Biomedical Computing, Washington University, St. Louis, MO 63108, USA.

PMID: 8790452
DOI: 10.1089/cmb.1994.1.39

Abstract

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.

MeSH terms

Algorithms
Amino Acid Sequence
Animals
Bacillus subtilis
Base Sequence
Codon*
Databases, Factual
Drosophila melanogaster
Escherichia coli
Humans
Molecular Sequence Data
Saccharomyces cerevisiae
Schizosaccharomyces
Sequence Analysis, DNA / methods*
Sequence Homology, Amino Acid
Sequence Homology, Nucleic Acid
Software*

Substances

Codon

Associated data

GENBANK/M18097
SWISSPROT/P11018