Combined use of sequence similarity and codon bias for coding region identification

J Comput Biol. 1994 Spring;1(1):39-50. doi: 10.1089/cmb.1994.1.39.

Abstract

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Bacillus subtilis
  • Base Sequence
  • Codon*
  • Databases, Factual
  • Drosophila melanogaster
  • Escherichia coli
  • Humans
  • Molecular Sequence Data
  • Saccharomyces cerevisiae
  • Schizosaccharomyces
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid
  • Software*

Substances

  • Codon

Associated data

  • GENBANK/M18097
  • SWISSPROT/P11018