Synthetic Oligonucleotide Probes Deduced From Amino Acid Sequence Data. Theoretical and Practical Considerations

J Mol Biol. 1985 May 5;183(1):1-12. doi: 10.1016/0022-2836(85)90276-1.


Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

MeSH terms

  • Amino Acid Sequence*
  • Codon
  • Genetic Code*
  • Humans
  • Nucleic Acid Hybridization
  • Oligonucleotides / chemical synthesis*
  • Statistics as Topic


  • Codon
  • Oligonucleotides