Pareto Optimization of Combinatorial Mutagenesis Libraries

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1143-1153. doi: 10.1109/TCBB.2018.2858794. Epub 2018 Jul 23.


In order to increase the hit rate of discovering diverse, beneficial protein variants via high-throughput screening, we have developed a computational method to optimize combinatorial mutagenesis libraries for overall enrichment in two distinct properties of interest. Given scoring functions for evaluating individual variants, POCoM (Pareto Optimal Combinatorial Mutagenesis) scores entire libraries in terms of averages over their constituent members, and designs optimal libraries as sets of mutations whose combinations make the best trade-offs between average scores. This represents the first general-purpose method to directly design combinatorial libraries for multiple objectives characterizing their constituent members. Despite being rigorous in mapping out the Pareto frontier, it is also very fast even for very large libraries (e.g., designing 30 mutation, billion-member libraries in only hours). We here instantiate POCoM with scores based on a target's protein structure and its homologs' sequences, enabling the design of libraries containing variants balancing these two important yet quite different types of information. We demonstrate POCoM's generality and power in case study applications to green fluorescent protein, cytochrome P450, and β-lactamase. Analysis of the POCoM library designs provides insights into the trade-offs between structure- and sequence-based scores, as well as the impacts of experimental constraints on library designs. POCoM libraries incorporate mutations that have previously been found favorable experimentally, while diversifying the contexts in which these mutations are situated and maintaining overall variant quality.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Cytochrome P-450 Enzyme System / genetics
  • Gene Library*
  • Green Fluorescent Proteins / metabolism
  • Models, Molecular
  • Mutagenesis*
  • Mutation
  • Oligonucleotides / genetics
  • Programming Languages
  • Protein Engineering / methods
  • Proteins / genetics
  • Software
  • beta-Lactamases / genetics


  • Oligonucleotides
  • Proteins
  • Green Fluorescent Proteins
  • Cytochrome P-450 Enzyme System
  • beta-Lactamases