Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 7;47(8):3846-3861.
doi: 10.1093/nar/gkz169.

Haplotype-resolved and Integrated Genome Analysis of the Cancer Cell Line HepG2

Affiliations
Free PMC article

Haplotype-resolved and Integrated Genome Analysis of the Cancer Cell Line HepG2

Bo Zhou et al. Nucleic Acids Res. .
Free PMC article

Abstract

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.

Figures

Figure 1.
Figure 1.
Comprehensive Overview of the HepG2 Genome. Circos visualization of HepG2 genome variants with the following tracks in concentric order starting with outermost ‘ring’: human genome reference track (hg19); large CN changes (colors correspond to different CN, see legend panel); in 1.5 Mb windows, merged SV density (deletions, duplications, inversions) called using BreakDancer, BreakSeq, PINDEL, LUMPY and Long Ranger; phased haplotype blocks (demarcated with four colors for clearer visualization); SNV density in 1 Mb windows; Indel density in 1 Mb windows; dominant zygosity (heterozygous or homozygous > 50%) in 1 Mb windows; regions with loss of heterozygosity; allele-specific expression; CpG islands exhibiting allele-specific DNA methylation; non-reference LINE1 and Alu insertions; allele-specific CRISPR target sites; large-scale SVs resolved by using Long Ranger (peach: intrachromosomal: dark maroon: interchromosomal); by using GROC-SVs (light-purple: intrachromsomal; dark-purple: interchromosomal).
Figure 2.
Figure 2.
HepG2 Karyogram and Callset Overview. (A) Representative karyogram of HepG2 cells by GTW banding that shows multiple numerical and structural abnormalities including a translocation between the short arms of chromosomes 1 and 21, trisomies of chromosomes 12, 16 and 17, tetrasomy of chromosome 20, uncharacterized rearrangements of chromosomes 16 and 17 and a two marker chromosomes. ISCN 2013 description: 49∼52,XY,t(1:21)(p22;p11),+2,+16,add(16)(p13),?+17,?add(17)(p11.2),+20,+20,+1∼3mar[cp15]/101∼106,idemx2[cp5]. (B) CNs (by percentage) across the HepG2 genome. (C) Percentage of HepG2 SNVs and Indels that are novel and known (in dbSNP). (D) Violin plot with overlaid boxplot of phased haplotype block sizes, with N50 represented as a dashed line (N50 = 6 792 324 bp) with log-scaled Y-axis. (E) X-axis: chromosome coordinate (Mb). Y-axis: difference in unique linked-read barcode counts between major and minor haplotypes, normalized by SNV density. Haplotype blocks from of normal control sample (NA12878) in blue and from HepG2 in dark gray. Density plots on the right reflects the distribution of the differences in haplotype-specific barcode counts for control sample HepG2. Significant difference (one-sided t-test, P < 0.001) in haplotype-specific barcode counts indicates aneuploidy and haplotype imbalance. Haplotype blocks (with ≥100 phased SNVs) generated from Long Ranger (Dataset 2) for the major and minor haplotypes were then ‘stitched’ to mega-haplotypes encompassing the entire triploid chromosome arms of 2p and 2q.
Figure 3.
Figure 3.
Large SVs in HepG2 Resolved from Linked-Read Sequencing using Long Ranger. HepG2 SVs resolved by identifying identical linked-read barcodes in distant genomic regions with non-expected barcode overlap for identified using Long Ranger (32,33). (A) Disruption of FRK by translocation between chromosomes 6 and 16. (B) 2.47 Mb intra-chromosomal rearrangement between MALRD1 and MLLT10 on chromosome 10. (C) 127 kb duplication on chromosome 7 resulting in partial duplications of USP42 and PMS2. (D) 395 kb duplication within PRKG1 on chromosome 10. (E) 31.3 kb inversion within GUSBP1 on chromosome 5. (F) 60.4 kb inversion that disrupts PPL and SEC14L5.
Figure 4.
Figure 4.
HepG2 SVs Reconstructed and Assembled Using GROC-SVs in HepG2. (AD) Each line depicts a fragment inferred from 10X-Genomics data based on clustering of reads with identical barcodes (Y-axis) identified from GROC-SVs (34). Abrupt ending (dashed vertical line) of fragments indicates location of SV breakpoint. All breakpoints depicted are validated by 3 kb-mate-pair sequencing data. Fragments are phased locally with respect to surrounding SNVs (haplotype-specific) are in orange, and black when no informative SNVs are found nearby. Gray lines indicate portions of fragments that do not support the current breakpoint. (A) Translocation between chromosomes 1 and 4. Linked-read fragments containing overlapping barcodes that map to chromosome 1 end abruptly near 248.60 mb indicating a breakpoint, and then continues simultaneously near 168.75 mb on chromosome 4. (B) Translocation between chromosomes 6 and 17. Linked-read fragments containing overlapping barcodes that map to chromosome 17 end abruptly near 36.17 mb indicating a breakpoint and then continues simultaneously near 113.52 mb on chromosome 6. (C) Large (335 kb) heterozygous deletion within NEDD4L on chromosome 18. (D) Large (1.3 mb) intra-chromosomal rearrangement that deletes large portions of RBFOX1 and RP11420N32 on chromosome 16.
Figure 5.
Figure 5.
Large and complex haplotype-resolved SVs using gemtools. Each SV is identified from linked-reads clustered by identical barcodes (i.e. SV-specific barcodes, Y-axis) indicative of single HMW DNA molecules (depicted by each row) that span SV breakpoints. Haplotype-specific SVs are represented in blue and red. X-axis: hg19 genomic coordinate. (Top) Complex SV on chromosome 8 involving a 4585 bp deletion downstream of ADAM2. This deletion is within a tandem duplication leading to the amplification of the IDO1 and the first half of IDO2. The presence of HMW molecules sharing the same linked-read barcodes spanning both breakpoints indicates a cis orientation and occurrence on only one allele of this locus. Schematic diagram of the rearranged structures drawn above the plot. (Middle) Two haplotype-resolved deletions 700 kb (blue) and 200 kb (red), respectively, occurring on two separate alleles within of PDE4D on chromosome 5—the spanning HMW molecules for each deletion do not share SV-specific barcodes, indicating that these deletions are in trans. Two haplotype-resolved deletions, 290 kb (red) and 160 kb (blue) respectively, within AUTS2 on chromosome 7. The reference allele of AUTS2 without the deletion (Haplotype 2) is also detected and resolved by linked-reads (blue, bottom panel). The 160 kb deletion on Haplotype 2 occurs sub-clonally.
Figure 6.
Figure 6.
Genomic Sequence and Structural Context Provides Insight into Regulatory Complexity in HepG2. (A) Chr5:57,755,334-57,756,803 locus containing the serine/threonine-protein kinase gene PLK2 and CGI 6693 (1463 bp) where phased Haplotype 1 and Haplotype 2. Allele-specific transcription of PLK2 from Haplotype 2 only. CpGs in CGI 6693 are mostly unmethylated in Haplotype 2 (expressed) and highly methylated in Haplotype 1 (repressed). (B) Chr17:59,473,060-59,483,266 locus (triploid in HepG2) containing T-box transcription factor gene TBX2 and CpG Island (CGI) 22251 (10 206 bp) where phased Haplotype 2 has two copies and Haplotype 1 has one copy. Allele-specific transcription of TBX2 from Haplotype 2 only. CpGs in CGI 22251 are unmethylated in Haplotype 1 (repressed) and methylated in Haplotype 2 (expressed). Allele-specific CRISPR targeting site 1937 bp inside the 5′ region of TBX2 for both Haplotypes. (C) Number of allele-specific RNA-Seq reads in Haplotypes 1 and 2 for PLK2 and TBX2 where both genes exhibit allele-specific RNA expression (P = 0.4.66E-10 and P = 0.0179, respectively). (D) Number of methylated and unmethylated phased whole-genome bisulfite-sequencing reads for Haplotypes 1 and 2 in CGI 6693 and CGI 22251 where both CGIs exhibit allele-specific DNA methylation (P = 1.51E-66 and P = 1.55E-32, respectively).

Similar articles

  • Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562.
    Zhou B, Ho SS, Greer SU, Zhu X, Bell JM, Arthur JG, Spies N, Zhang X, Byeon S, Pattni R, Ben-Efraim N, Haney MS, Haraksingh RR, Song G, Ji HP, Perrin D, Wong WH, Abyzov A, Urban AE. Zhou B, et al. Genome Res. 2019 Mar;29(3):472-484. doi: 10.1101/gr.234948.118. Epub 2019 Feb 8. Genome Res. 2019. PMID: 30737237 Free PMC article.
  • Next generation mapping reveals novel large genomic rearrangements in prostate cancer.
    Jaratlerdsiri W, Chan EKF, Petersen DC, Yang C, Croucher PI, Bornman MSR, Sheth P, Hayes VM. Jaratlerdsiri W, et al. Oncotarget. 2017 Apr 4;8(14):23588-23602. doi: 10.18632/oncotarget.15802. Oncotarget. 2017. PMID: 28423598 Free PMC article.
  • Multi-platform discovery of haplotype-resolved structural variation in human genomes.
    Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T, Gardner EJ, Rodriguez OL, Guo L, Collins RL, Fan X, Wen J, Handsaker RE, Fairley S, Kronenberg ZN, Kong X, Hormozdiari F, Lee D, Wenger AM, Hastie AR, Antaki D, Anantharaman T, Audano PA, Brand H, Cantsilieris S, Cao H, Cerveira E, Chen C, Chen X, Chin CS, Chong Z, Chuang NT, Lambert CC, Church DM, Clarke L, Farrell A, Flores J, Galeev T, Gorkin DU, Gujral M, Guryev V, Heaton WH, Korlach J, Kumar S, Kwon JY, Lam ET, Lee JE, Lee J, Lee WP, Lee SP, Li S, Marks P, Viaud-Martinez K, Meiers S, Munson KM, Navarro FCP, Nelson BJ, Nodzak C, Noor A, Kyriazopoulou-Panagiotopoulou S, Pang AWC, Qiu Y, Rosanio G, Ryan M, Stütz A, Spierings DCJ, Ward A, Welch AE, Xiao M, Xu W, Zhang C, Zhu Q, Zheng-Bradley X, Lowy E, Yakneen S, McCarroll S, Jun G, Ding L, Koh CL, Ren B, Flicek P, Chen K, Gerstein MB, Kwok PY, Lansdorp PM, Marth GT, Sebat J, Shi X, Bashir A, Ye K, Devine SE, Talkowski ME, Mills RE, Marschall T, Korbel JO, Eichler EE, Lee C. Chaisson MJP, et al. Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z. Nat Commun. 2019. PMID: 30992455 Free PMC article.
  • Genomic Analysis in the Age of Human Genome Sequencing.
    Lappalainen T, Scott AJ, Brandt M, Hall IM. Lappalainen T, et al. Cell. 2019 Mar 21;177(1):70-84. doi: 10.1016/j.cell.2019.02.032. Cell. 2019. PMID: 30901550 Free PMC article. Review.
  • Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery.
    Bocklandt S, Hastie A, Cao H. Bocklandt S, et al. Adv Exp Med Biol. 2019;1129:97-118. doi: 10.1007/978-981-13-6037-4_7. Adv Exp Med Biol. 2019. PMID: 30968363 Review.
See all similar articles

Cited by 5 articles

References

    1. Negrini S., Gorgoulis V.G., Halazonetis T.D. Genomic instability–an evolving hallmark of cancer. Nat. Rev. Mol. Cell Biol. 2010; 11:220–228. - PubMed
    1. Hanahan D., Weinberg R.A. Hallmarks of Cancer: the next generation. Cell. 2011; 144:646–674. - PubMed
    1. Adey A., Burton J.N., Kitzman J.O., Hiatt J.B., Lewis A.P., Martin B.K., Qiu R., Lee C., Shendure J. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature. 2013; 500:207–211. - PMC - PubMed
    1. Aden D.P., Fogel A., Plotkin S., Damjanov I., Knowles B.B. Controlled synthesis of HBsAg in a differentiated human liver carcinoma-derived cell line. Nature. 1979; 282:615–616. - PubMed
    1. López-Terrada D., Cheung S.W., Finegold M.J., Knowles B.B. Hep G2 is a hepatoblastoma-derived cell line. Hum. Pathol. 2009; 40:1512–1515. - PubMed

Publication types

Feedback