Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun;29(6):1023-1035.
doi: 10.1101/gr.246082.118. Epub 2019 May 23.

Long-read sequencing reveals intra-species tolerance of substantial structural variations and new subtelomere formation in C. elegans

Affiliations

Long-read sequencing reveals intra-species tolerance of substantial structural variations and new subtelomere formation in C. elegans

Chuna Kim et al. Genome Res. 2019 Jun.

Abstract

Long-read sequencing technologies have contributed greatly to comparative genomics among species and can also be applied to study genomics within a species. In this study, to determine how substantial genomic changes are generated and tolerated within a species, we sequenced a C. elegans strain, CB4856, which is one of the most genetically divergent strains compared to the N2 reference strain. For this comparison, we used the Pacific Biosciences (PacBio) RSII platform (80×, N50 read length 11.8 kb) and generated de novo genome assembly to the level of pseudochromosomes containing 76 contigs (N50 contig = 2.8 Mb). We identified structural variations that affected as many as 2694 genes, most of which are at chromosome arms. Subtelomeric regions contained the most extensive genomic rearrangements, which even created new subtelomeres in some cases. The subtelomere structure of Chromosome VR implies that ancestral telomere damage was repaired by alternative lengthening of telomeres even in the presence of a functional telomerase gene and that a new subtelomere was formed by break-induced replication. Our study demonstrates that substantial genomic changes including structural variations and new subtelomeres can be tolerated within a species, and that these changes may accumulate genetic diversity within a species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
CB4856 genome assembly and comparison with the N2 genome at a chromosome level. (A) Schematic representation of CB4856 contig lengths mapped to N2 WBcel235 chromosomes. (B) PacBio raw read coverage, mapped on CB4856 chromosomes (100-kb binned). Reads were distributed at average 60× coverage. (C) Schematic of large chromosomal rearrangement between N2 and CB4856 genomes identified using progressiveMauve. The blue box and line indicate inversion; the red box and line, translocation; and the white box indicates the unaligned block. Chr VR has several small rearrangements and unaligned blocks. Chr II: 3,896,126–3,900,949 in N2 was inverted in CB4856 (Chr II: 4,045,653–4,040,823), Chr V: 17,616,880–17,623,484 in N2 was inverted in CB4856 (Chr V: 17,734,209–17,728,873), and Chr V: 19,258,912–19,289,935 in N2 was located at Chr V: 21,193,104–21,237,336 in CB4856. (D) Schematic representation of CB4856 HiSeq reads mapped on the CB4856 genome (blue) or the N2 genome (yellow). Each dot shows the heterozygous base count (100-kb interval) from Chr I to Chr X.
Figure 2.
Figure 2.
Structural variations (SVs) between the CB4856 and N2 genomes and their effects on chromosomal contents. (A) SVs between the N2 genome and the short-read-based CB4856 genome, previously reported (left), and between the N2 genome and the long read-based CB4856 genome (right). Repeat expansion, tandem expansion, and insertion SVs are more often detected when using long read-based genome than when using the previous short read-based genome. (B) Tracks representing density at 100-kb intervals; from outside to inside: 1, genomic positions (in Mb) of the six chromosomes based on the N2 genome; 2, density of local recombination rate in CB4856/N2 introgression lines; 3–9, types of SVs identified using Assemblytics: 3, size of SVs; 4, density of repeat-contraction SVs; 5, density of repeat-expansion SVs; 6, density of tandem-contraction SVs; 7, density of tandem-expansion SVs; 8, density of deletion SVs; 9, density of insertion SVs. (C) Tracks representing density at 100-kb intervals; from outside to inside: 1, genomic positions (in Mb) of the six chromosomes based on the N2 genome; 2–4, density of SVs estimated by SnpEff: 2, high-impact SVs; 3, low-impact SVs; 4, modifier SVs. (D) Annotation of SVs. SVs effects were categorized using SnpEff based on their position in the annotated N2 genome. “N2-specific genes” indicates the number of the genes that are completely deleted in CB4856. ‘Genic’ indicates the number of genes whose function is predicted to be affected by the SVs. ‘Intergenic’ indicates the number of SVs in the intergenic region. ‘Upstream’ indicates the number of SVs located within 5 kb upstream of a gene. ‘Downstream’ indicates the number of SVs located within 5 kb downstream from a gene.
Figure 3.
Figure 3.
New subtelomere formation in CB4856 Chr VR using an alternative lengthening of telomeres (ALT) mechanism. (A) Schematic representation of subtelomere differences between the N2 and CB4856 chromosomes. Yellow bars and blue bars at the end of chromosomes indicate the ratio of unaligned bases of subtelomeres in N2 and CB4856 genome, respectively. (B) Dot plot representing alignment between internal segment (V: 19,377,978–19,606,221) and duplicated segment (V: 21,171,521–21,389,866) of CB4856 Chr VR; 63% of the two regions are aligned, and 91% of the aligned bases are identical. Red: forward strand matches; blue: reverse strand matches. (C) Telomere length of all chromosomes deduced from the long-read CB4856 genome. ‘HiSeq’ data are mean telomere lengths normalized by the telseq software (Ding et al. 2014). The red bar represents the end of N2 (Chr VR internal) in Chr VR of CB4856. Only small portions of the N2 telomere remain in CB4856, followed by a new subtelomere. ‘Chr V terminal’ is from the real end of Chr VR. (D) Schematic representation of Chr V subtelomere in CB4856. Five copies of template for ALT (TALT) (red) are connected to the duplicated segment from the internal segment close to the internal TALT (V: 19,366,148–19,367,611). The bottom shows PacBio raw reads on the tandemly repeated TALT region. Four raw reads almost fully cover this region.
Figure 4.
Figure 4.
New subtelomere formation in wild isolates. (A) Internal genes were duplicated to Chr VR subtelomere. The figure shows a putative gene model of the Chr VR subtelomere. Upper panel: internal gene model; lower panel: subtelomeric gene model. (B) TALT copy numbers among wild isolates (Supplemental Table S3). (C) Normalized coverage mapped on the duplicated segment of wild isolates with high TALT copy number (red) strains and low TALT copy number (blue) strains. (D) Haplotype blocks on Chr V of seven strains that have high TALT copy numbers. (E) Phylogenic tree of reference N2 and 151 wild strains whose genomes have been fully sequenced. Strains marked with red color contain several copies of TALT.
Figure 5.
Figure 5.
A model of Chr VR subtelomere formation in CB4856. The CB4856 ancestor underwent telomere crisis, and two sequential telomere-damage repair events, one using ALT and the other using BIR, formed new subtelomeres. Finally, the duplicated block end was repaired by telomerase, ending with at least 3-kb-long telomeric repeats.

Similar articles

Cited by

References

    1. Alföldi J, Lindblad-Toh K. 2013. Comparative genomics as a tool to understand evolution and disease. Genome Res 23: 1063–1068. 10.1101/gr.157503.113 - DOI - PMC - PubMed
    1. Andersen EC, Gerke JP, Shapiro JA, Crissman JR, Ghosh R, Bloom JS, Félix M-A, Kruglyak L. 2012. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat Genet 44: 285–290. 10.1038/ng.1050 - DOI - PMC - PubMed
    1. Andersen EC, Bloom JS, Gerke JP, Kruglyak L. 2014. A variant in the neuropeptide receptor npr-1 is a major determinant of Caenorhabditis elegans growth and physiology. PLoS Genet 10: e1004156 10.1371/journal.pgen.1004156 - DOI - PMC - PubMed
    1. Angeles-Albores D, Lee RYN, Chan J, Sternberg PW. 2016. Tissue enrichment analysis for C. elegans genomics. BMC Bioinformatics 17: 366 10.1186/s12859-016-1229-9 - DOI - PMC - PubMed
    1. Bao W, Kojima KK, Kohany O. 2015. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6: 11 10.1186/s13100-015-0041-9 - DOI - PMC - PubMed

Publication types

LinkOut - more resources