Creating De Novo Overlapped Genes

Methods Mol Biol. 2023;2553:95-120. doi: 10.1007/978-1-0716-2617-7_6.

Abstract

Future applications of synthetic biology will rely on deploying engineered cells outside of lab environments for long periods of time. Currently, a significant roadblock to this application is the potential for deactivating mutations in engineered genes. A recently developed method to protect engineered coding sequences from mutation is called Constraining Adaptive Mutations using Engineered Overlapping Sequences (CAMEOS). In this chapter we provide a workflow for utilizing CAMEOS to create synthetic overlaps between two genes, one essential (infA) and one non-essential (aroB), to protect the non-essential gene from mutation and loss of protein function. In this workflow we detail the methods to collect large numbers of related protein sequences, produce multiple sequence alignments (MSAs), use the MSAs to generate hidden Markov models and Markov random field models, and finally generate a library of overlapping coding sequences through CAMEOS scripts. To assist practitioners with basic coding skills to try out the CAMEOS method, we have created a virtual machine containing all the required packages already installed that can be downloaded and run locally.

Keywords: Deep learning; Generative model; Genome compression; Machine learning; Markov random field; Multiple sequence alignments; Overlapping genes; Protein design; Synthetic biology; Synthetic genomes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Open Reading Frames
  • Proteins*
  • Sequence Alignment

Substances

  • Proteins