Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 31;29(8):735-41.
doi: 10.1038/nbt.1932.

The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 Cell Line

Free PMC article

The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 Cell Line

Xun Xu et al. Nat Biotechnol. .
Free PMC article


Chinese hamster ovary (CHO)-derived cell lines are the preferred host cells for the production of therapeutic proteins. Here we present a draft genomic sequence of the CHO-K1 ancestral cell line. The assembly comprises 2.45 Gb of genomic sequence, with 24,383 predicted genes. We associate most of the assembled scaffolds with 21 chromosomes isolated by microfluidics to identify chromosomal locations of genes. Furthermore, we investigate genes involved in glycosylation, which affect therapeutic protein quality, and viral susceptibility genes, which are relevant to cell engineering and regulatory concerns. Homologs of most human glycosylation-associated genes are present in the CHO-K1 genome, although 141 of these homologs are not expressed under exponential growth conditions. Many important viral entry genes are also present in the genome but not expressed, which may explain the unusual viral resistance property of CHO cell lines. We discuss how the availability of this genome sequence may facilitate genome-scale science for the optimization of biopharmaceutical protein production.


Figure 1
Figure 1. Chromosomal assignment to scaffolds
(a) Chromosomal preparations from CHO-K1 were sequenced and the reads were aligned to the scaffolds. For each of the N50 scaffolds, a vector was used to represent the read alignments in the 22 preparations. Using this metric, a correlation matrix was generated between all the N50 scaffolds. Upon clustering the matrix, 21 clusters of highly correlated scaffolds emerged, suggesting that the scaffolds are associated with 21 chromosomes in CHO-K1. (b) Classical karyotyping of CHO-K1 reveals 21 chromosomes.
Figure 2
Figure 2. Comparative analysis of functional categories and gene content
For each GOslim biological process category, the fraction of all GO terms in that category is shown for human, mouse, rat, and CHO genomes. GOslim classes that are significantly enriched and show the highest and lowest coverage of human and mouse genes in the CHO genome are highlighted in red (*) and green (**), respectively.
Figure 3
Figure 3. A global view of the expression of CHO-K1 glycosylation genes
(a) While homologs were identified for 99% of the human glycosylation-associated transcripts, only 53% had detectable expression. Glycosylation gene classes enriched in expressed genes (denoted with **) include hyaluronoglucosaminidases, sugar-nucleotide synthesis, mannosyltransferases, and lysozomal enzymes. Significantly depleted classes in expressed genes (denoted with *) include the sulfotransferases, fucosyltransferases, and GalNAc transferases. (b) A selection of CHO N-linked glycosylation pathways are detailed to demonstrate the effects of CHO glycosylation gene expression on the possible glycoforms. (i) A difference between human and CHO glycosylation is seen in the lack of expression of MGAT3, which is responsible for the bisecting β(1,4) GlcNAc that occurs on ~10% of human antibodies. (ii) The only N-glycan-modifying fucosyltransferase expressed in CHO-K1 is FUT8, which adds fucose to the core glycan by an α(1,6) linkage. (iii) Sialylation of a terminal galactose can occur via α(2,3) or α(2,6) linkages in human. However, CHO ST6Gal genes are not expressed, so CHO glycans primarily have α(2,3) linkages. (iv)The two most abundant sialic acids are Neu5Ac and Neu5Gc. Neu5Gc is immunogenic in humans. Thus, the lack of CMAH expression in CHO minimizes this response by limiting the conversion of Neu5Ac to Neu5Gc. Pathways are adapted loosely from . Abbreviations are defined in Supplementary Table 18.
Figure 4
Figure 4. An assessment of the expression state of viral susceptibility genes in CHO-K1
(a) A global view of viral susceptibility genes in CHO-K1 demonstrates no measurable expression for 158 of these genes. The enriched GO cell compartment terms among the non-expressed susceptibility genes shows that membrane proteins and DNA binding proteins are primarily not expressed. The expression state of all members of the “external side of plasma membrane” GO class is shown (blue and red for expressed and not expressed, respectively). b) A schematic of entry mechanisms used by HSV-1 is provided. Viral entry receptors that are not expressed in CHO are shown by their gene names in red, and missing receptors are shown with a dashed outline. WT HSV = wild type HSV-1, Mut HSV = mutant HSV-1, Bov HSV = bovine HSV.
Figure 5
Figure 5. The CHO-K1 genome will aid in cell line engineering, generate hypotheses for biological discovery, and serve as a context to facilitate sequencing efforts and sequence analysis for additional cell lines
While significant advances in CHO biology have occurred over the past decades, the accessibility of the CHO-K1 genome will impact at least three major areas. (a) The CHO genome will aid cell line engineering by facilitating the application of experimental and computational sequence-based tools for genetic manipulation and genome analysis. For example, BLAST can be used to identify the CHO sequence of a desired gene, while siRNA and site-directed mutagenesis methods can be used to directly modulate gene expression levels and protein activities. Moreover, the genome sequence can be used to reconstruct models of CHO-K1 metabolism, which allow the assessment of how genetic manipulations affect other pathways and can predict non-intuitive genetic changes to improve product yield or quality. (b) The biomolecular mechanisms underlying many phenotypic properties of CHO are poorly characterized (e.g., viral susceptibility). The components underlying these phenotypes can be identified through the comparison of CHO gene content and gene expression with other organisms or cell lines. (c) While large genomic changes can occur in immortalized and engineered cell lines such as CHO, the CHO-K1 genome can serve as a context for the assembly and analysis of genome sequences from additional CHO cell lines.

Comment in

  • First CHO genome.
    Wurm FM, Hacker D. Wurm FM, et al. Nat Biotechnol. 2011 Aug 5;29(8):718-20. doi: 10.1038/nbt.1943. Nat Biotechnol. 2011. PMID: 21822249 No abstract available.

Similar articles

See all similar articles

Cited by 160 articles

See all "Cited by" articles


    1. Walsh G. Biopharmaceutical benchmarks 2010. Nat Biotechnol. 2010;28:917–924. - PubMed
    1. Lim Y, et al. Engineering mammalian cells in bioprocessing - current achievements and future perspectives. Biotechnol Appl Biochem. 2010;55:175–189. - PubMed
    1. Wurm FM. Production of recombinant protein therapeutics in cultivated mammalian cells. Nat Biotechnol. 2004;22:1393–1398. - PubMed
    1. Seth G, Charaniya S, Wlaschin KF, Hu WS. In pursuit of a super producer-alternative paths to high producing recombinant mammalian cells. Curr Opin Biotechnol. 2007;18:557–564. - PubMed
    1. Derouazi M, et al. In: Cell Technology for Cell Products. Smith R, editor. Netherlands: Springer; 2007. pp. 443–446.

Publication types

Associated data

LinkOut - more resources