Identification of Protein Coding Regions by Database Similarity Search

Nat Genet. 1993 Mar;3(3):266-72. doi: 10.1038/ng0393-266.

Abstract

Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Databases, Factual*
  • Molecular Sequence Data
  • Mutation
  • Probability
  • Proteins / genetics*
  • Rats
  • Ribosomal Proteins / genetics
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Proteins
  • Ribosomal Proteins
  • Rpl19 protein, rat

Associated data

  • GENBANK/J03724