Scoring Functions for Computational Algorithms Applicable to the Design of Spiked Oligonucleotides

Nucleic Acids Res. 1998 Feb 1;26(3):697-702. doi: 10.1093/nar/26.3.697.

Abstract

Protein engineering by inserting stretches of random DNA sequences into target genes in combination with adequate screening or selection methods is a versatile technique to elucidate and improve protein functions. Established compounds for generating semi-random DNA sequences are spiked oligonucleotides which are synthesised by interspersing wild type (wt) nucleotides of the target sequence with certain amounts of other nucleotides. Directed spiking strategies reduce the complexity of a library to a manageable format compared with completely random libraries. Computational algorithms render feasible the calculation of appropriate nucleotide mixtures to encode specified amino acid subpopulations. The crucial element in the ranking of spiked codons generated during an iterative algorithm is the scoring function. In this report three scoring functions are analysed: the sum-of-square-differences function s, a modified cubic function c, and a scoring function m derived from maximum likelihood considerations. The impact of these scoring functions on calculated amino acid distributions is demonstrated by an example of mutagenising a domain surrounding the active site serine of subtilisin-like proteases. At default weight settings of one for each amino acid, the new scoring function m is superior to functions s and c in finding matches to a given amino acid population.

MeSH terms

  • Algorithms*
  • Amino Acids / genetics
  • Base Composition
  • Codon
  • Deoxyribonucleotides / chemical synthesis
  • Deoxyribonucleotides / genetics*
  • Gene Library
  • Models, Genetic*
  • Mutagenesis, Insertional / statistics & numerical data*
  • Subtilisins / genetics

Substances

  • Amino Acids
  • Codon
  • Deoxyribonucleotides
  • Subtilisins