Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Jun 8;360(6393):eaar6343.
doi: 10.1126/science.aar6343.

High-resolution Comparative Analysis of Great Ape Genomes

Affiliations
Free PMC article
Comparative Study

High-resolution Comparative Analysis of Great Ape Genomes

Zev N Kronenberg et al. Science. .
Free PMC article

Abstract

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.

Conflict of interest statement

Competing interests: E.E.E. is on the scientific advisory board (SAB) of DNAnexus, Inc.; A.R.H., A.W.C.P., J.L., E.T.L. and H.C. are employees of Bionano Genomics, Inc.; J.G.U. is an employee of Pacific Biosciences, Inc.

Figures

Fig. 1.
Fig. 1.. Assembly and annotation of great ape genomes.
a) Comparison of genome sequence contiguity. Chromosome 3 contiguity is compared among the great ape genome assemblies by alignment to GRCh38. Contigs larger (blue) and smaller (green) than 3 Mbp are compared with the position of segmental duplications (SDs >50 kbp, orange) shown in the reference ideogram. b) Scatterplot of syntenic-alignment block lengths (x-axis) against GRCh38 vs. contig N50 (y-axis) of the great ape assemblies. The SMRT assemblies are Clint_PTRv1, Susie_PABv1, GSMRT3.2, CHM13_HSAv1, and YRI_HSAv1. The previous reference genomes are ponAbe2 (GCF_000001545.3), gorGor4 (GCA_000151905.3), panTro2 (GCF_000001515.2), panTro3 (GCA_000001515.3), panTro4 (GCA_000001515.4), and panTro5 (GCA_000001515.5). c) Full-length assembled transcripts mapped to Clint_PTRv1 and panTro3. Each point denotes the number of bases/transcript matching the two assemblies. Repeat content is indicated by gray shading of the points. While the majority of transcripts map well to both assemblies (Pearson’s correlation = 0.95), the subset of differentially mapped transcripts (12,724; 60% of 21,118) aligns better to Clint_PTRv1 (dots above the blue dashed line). The histogram inset shows the effect, per transcript, with a total of 4.8 Mbp more bases aligned to Clint_PTRv1. d) Comparative Annotation Toolkit (CAT) was used to project transcripts from GRCh38 to Clint_PTRv1, panTro3, Susie_PABv1, and ponAbe2. Alignment coverage and identity were compared for orthologous transcripts found in each assembly pair. The boxplots (left) summarize TransMap differences between the short-read and SMRT assemblies in terms of coverage and identity. The shaded portion of the bar plots (right) represents alignments, which had identical coverage or identity in both assemblies.
Fig. 2.
Fig. 2.. Ape genetic diversity and lineage sorting.
a) Single-nucleotide variant (SNV) divergence between each primate assembly and GRCh38 was calculated in 1 Mbp non-overlapping windows across all autosomes and chromosome X (excluding X-Y homologous regions). Mean autosomal divergence is 1.27+/−0.20% (human-chimpanzee), 1.61+/−0.21% (human-gorilla) and 3.12+/−0.33% (human-orangutan). The African genome (YRI_HSAv1) shows a 17% increase in SNV diversity. b) Proportion of phylogenetic trees supporting standard species topology and incomplete lineage sorting (ILS). The mean and 95% confidence intervals are based on 100 genome-wide permutations. c) A phylogenetic tree (maximum clade credibility consensus tree) comparing genic regions (~9,000 consensus CDS (CCDS) and 1,000 bp flanking sequence [orange]) to a randomly genome-shuffled set matched to CDS lengths (green). The analysis excludes regions of SDs, SVs and large tandem repeats. Branch lengths (above the lines) and proportion of trees supporting each bifurcation (internal nodes) are shown. Violin plots summarize the distribution and mean divergence (substitutions/bp) for a subset of trees consistent with the species tree. YRI_HSAv1 is the representative human in the violin plots. d) A comparison of the expanded STR sequences (n = 16,138 loci) between human (African) and chimpanzee ab initio genome assemblies shows little to no species bias (0.02 bp). e) A multiple sequence alignment (MSA) of ape genomes (gorilla BAC CH277-16N20, chimpanzee CH251-550G17) identifies an orthologous 379 bp PtERV1 element nested within another LTR and shared between gorilla and chimpanzee. A maximum likelihood phylogenetic tree (GTR+Gamma) built from 12,108 bp supporting ILS. Single-nucleotide polymorphisms that support chimpanzee-gorilla sorting (CG_HO) are shown as blue lines and the red lines show single-nucleotide polymorphisms supporting the species tree (CH_GO). Branch lengths (substitutions per site) are shown above the lineages and internal nodes are labeled with bootstrap support (proportion of replicates supporting split; 1,000 replicates).
Fig. 3.
Fig. 3.. Fixed structural variation and regulatory mutation.
a) The great ape cladogram with fixed structural variation assigned to lineages on the basis of assembly comparison, genotyping and stratification (except for inversions). The total amount of sequence is shown on the left side of the branches and the number of SVs is shown on the right for deletions (blue), insertions (red) and inversions (magenta). Inversions were assigned to branches on the basis of the comparison of our five assemblies because genotyping was less reliable. The cladogram was rooted against Susie_PABv1, meaning the assignment of SVs to the orangutan or the common ancestor of human, chimpanzee, and gorilla is arbitrary. b) A map of fixed human-specific structural variants (fhSVs). The color denotes number of fhSVs bases (kbp), within a 1 Mbp sliding window (0.5 Mbp step). Each chromosome is labeled on the y-axis. Key regions are annotated with genes. c) The cell specificity for a mouse enhancer element (mm652, represented as a yellow box) that shares orthology in chimpanzee. In human, an AluY element has been inserted directly into the mm652 enhancer. d) A human-specific STR interrupts a mouse heart-specific enhancer shared with chimpanzee (yellow box). The STR is contained within a CFAP20 intron. e) Dotplots of the human-specific STR expansion. The two human assemblies, CHM13_HSAv1 and YRI_HSAv1, show additional STR expansion relative to GRCh38, suggesting the reference is collapsed. f) A comparison of the hCONDEL set reported by McLean et al. (5) (V1) vs. the hCONDELs reported here (V2). The current hCONDELs are from conservation (25 bp MSA windows) between chimpanzee, macaque and mouse. The current hCONDELs are from conservation (25 bp MSA windows) between chimpanzee, macaque and mouse. The dashed gray area shows the overlap between all fixed human deletions and all V1 hCONDELs. g) A Miropeats diagram of the gorilla complex SV (inversion and deletion) upstream of the AR locus; the human reference genome is shown on the bottom.
Fig 4.
Fig 4.. Examples of intragenic human-specific structural variation.
Shown are annotated MSAs between the human reference (GRCh38) and nonhuman primates (NHPs) generated with MAFFT or visualized with Miropeats against sequenced large-insert primate clones. Single-cell gene expression for select genes is highlighted across 4,261 cells developing human telencephalon plotted using t-distributed stochastic neighbor embedding (tSNE) (67). a) A 66.2 kbp intragenic deletion of CARD8 removes 13 putative coding exons in human. Iso-Seq data from chimpanzee and human iPSCs identifies isoforms with and without the deleted exons, respectively. b) A 62.5 kbp intergenic deletion of FADS2 is found in humans, along with an altered isoform ratio: the relative abundance of the long isoforms is increased in humans relative to chimpanzee, as seen in the counts of junction-spanning short reads specific to each isoform. Additionally, a novel, rare (<5%) 75 bp exon is observed in chimpanzee and gorilla but absent in human, likely resulting from a human-specific splice-site mutation. c) A 107 bp deletion in the 3’ UTR of WEE1 reduces AU-rich sequence content in the mRNA. The tSNE plot illustrates that WEE1 is highly expressed in cortical radial glia (RG), intermediate progenitor cells (IPCs), and medial ganglionic eminence progenitors (MGE RG) but shows limited expression in newborn and maturing inhibitory and excitatory neurons (nIN, mIN, nEN, mEN), microglia, endothelial cells (ECs), and glia. d) A 1,920 bp deletion of cell cycle regulator CDC25C removes a 99 bp constitutive exon conserved in mouse, resulting in a 33 amino acid deletion and shorter N-terminal regulatory domain in humans. The tSNE plot illustrates that CDC25C shows restricted expression to telencephalon progenitors in the G2/M cell cycle phase. Human and chimpanzee RNA-seq data were aligned directly to the exonic regions of CDC25C.
Fig. 5.
Fig. 5.. Complex structural variation.
Large-scale inversions between human and chimpanzee are depicted. The human reference genome sequence (GRCh38) with gene annotation is compared to large-insert clone-based assemblies from the chimpanzee BAC library CH251 using Miropeats. Connecting lines identify homologous regions of high sequence identity. SD organization is depicted as colored arrows as defined by whole-genome shotgun sequence detection (WSSD) and DupMasker. Heatmap indicates copy number (CN) estimated by read-depth from ape genome sequence. a) A ~265 kbp inversion on chromosome 13q14.3 detected by optical mapping in chimpanzee (annotated blue lines). The inverted region is flanked by large ~180 kbp inverted SD blocks that vary with respect to copy number among great apes. b) A 2.7 Mbp inversion on chromosome 2q12-13 detected by BAC end sequencing in chimpanzee (annotated green lines). The inverted region is flanked by duplication blocks containing lineage-specific expansions of the interleukins, an inverted duplication of REV1, and an additional copy of the RGPD4 core duplicon. c) A ~1.1 Mbp inversion at chr13q14.13 identified by optical mapping in chimpanzee encompassing 15 genes. On the telomeric side of the inversion lies a ~60 kbp duplication block that demonstrates lineage-specific duplications in great apes. d) Chromosome inversions, originally detected by optical mapping and BAC end sequencing, confirmed by metaphase analysis and interphase FISH experiments. A human-specific inversion of the chromosome 16q22.1 region was confirmed with orangutan clones CH276-89P20 (red) and CH276-192M7 (green) reported in upper line, and the 15q25.2 inversion was confirmed using chimpanzee clones CH251-321P13 (red), CH251-511D5 (green) and CH251-66E11 (blue) reported in lower line.
Fig. 6.
Fig. 6.. Structural variation and neural progenitor expression differences between human and chimpanzee.
a) Volcano plots for chimpanzee–human gene expression in excitatory neuron (left) and radial glia (right) organoid single-cell data. Each point represents a gene, with sufficient data to assess significance between human and chimpanzee organoid cells. Genes with fhSVs within 50 kbp are denoted with a triangle. The data points are shaded by significance. b) Spatial permutation test for overlap between fhSVs and differentially expressed genes. Each violin shows the null distribution of human-specific SV overlap (+/−50 kbp of transcript start/end) with genes that are significantly differentially down or upregulated, relative to chimpanzee. The horizontal bars and observed counts are overlaid upon the null distribution. c) Heatmap illustrating the percentile gene expression of differentially expressed genes near fhSVs (rows) across single cells (columns), including genes near the start or end of inversions (circle) and duplicated regions (WSSD) (triangle). Cells include 333 excitatory neurons (97 chimpanzee organoid; 53 human organoid; 183 human primary cells) and 373 radial glia (113 chimpanzee organoid; 123 human organoid; 137 human primary cells) (56, 57). Expression patterns include concerted changes between chimpanzee and human cells across radial glia and excitatory neurons (chimpanzee RG and EN; human RG and EN), cell-type-specific changes (human EN; human RG) and conserved radial glia expression (pan-RG).

Comment in

Similar articles

See all similar articles

Cited by 41 articles

See all "Cited by" articles

Publication types

Feedback