Embedding permanent watermarks in synthetic genes

PLoS One. 2012;7(8):e42465. doi: 10.1371/journal.pone.0042465. Epub 2012 Aug 8.


As synthetic biology advances, labeling of genes or organisms, like other high-value products, will become important not only to pinpoint their identity, origin, or spread, but also for intellectual property, classification, bio-security or legal reasons. Ideally information should be inseparably interlaced into expressed genes. We describe a method for embedding messages within open reading frames of synthetic genes by adapting steganographic algorithms typically used for watermarking digital media files. Text messages are first translated into a binary string, and then represented in the reading frame by synonymous codon choice. To aim for good expression of the labeled gene in its host as well as retain a high degree of codon assignment flexibility for gene optimization, codon usage tables of the target organism are taken into account. Preferably amino acids with 4 or 6 synonymous codons are used to comprise binary digits. Several different messages were embedded into open reading frames of T7 RNA polymerase, GFP, human EMG1 and HIV gag, variously optimized for bacterial, yeast, mammalian or plant expression, without affecting their protein expression or function. We also introduced Vigenère polyalphabetic substitution to cipher text messages, and developed an identifier as a key to deciphering codon usage ranking stored for a specific organism within a sequence of 35 nucleotides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon
  • Computational Biology / methods
  • Contig Mapping
  • DNA-Directed RNA Polymerases / genetics
  • Gene Expression Regulation
  • Genes, Synthetic*
  • Genetic Techniques*
  • Green Fluorescent Proteins / genetics
  • HEK293 Cells
  • Humans
  • Methyltransferases / genetics
  • Models, Genetic
  • Nuclear Proteins / genetics
  • Nucleotides / genetics
  • Open Reading Frames
  • Saccharomyces cerevisiae / metabolism
  • Viral Proteins / genetics
  • gag Gene Products, Human Immunodeficiency Virus / genetics


  • Codon
  • Nuclear Proteins
  • Nucleotides
  • Viral Proteins
  • gag Gene Products, Human Immunodeficiency Virus
  • Green Fluorescent Proteins
  • EMG1 protein, human
  • Methyltransferases
  • bacteriophage T7 RNA polymerase
  • DNA-Directed RNA Polymerases

Grants and funding

Assessment of HIV-1 infectivity and long-term replication kinetics was funded by the Bundesministerium fuer Bildung und Forschung, grant 0313687. Tobacco leaf transfection and analysis was funded by the Deutsche Forschungsgemeinschaft, grant HA3468/4-1. No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.