High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process.
Keywords: CHO cells; next-generation sequencing; sequence variants; single-molecule real-time sequencing.
© 2018 Wiley Periodicals, Inc.