Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 11 (1), e1001473

The Oxytricha Trifallax Macronuclear Genome: A Complex Eukaryotic Genome With 16,000 Tiny Chromosomes

Affiliations

The Oxytricha Trifallax Macronuclear Genome: A Complex Eukaryotic Genome With 16,000 Tiny Chromosomes

Estienne C Swart et al. PLoS Biol.

Abstract

The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor "silent" germline micronuclear genome by a process of "unscrambling" and fragmentation. The tiny macronuclear "nanochromosomes" typically encode single, protein-coding genes (a small portion, 10%, encode 2-8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Development of the Oxytricha macronuclear genome from the micronuclear genome.
During conjugation of Oxytricha cells, segments of the micronuclear genome (MDSs) are excised and stitched together to form the nanochromosomes of the new macronuclear genome, and the remainder of the micronuclear genome is eliminated (including the IESs interspersed between MDSs). The old macronuclear genome is also degraded during development. The segments that are stitched together may be either in order (e.g., forming nanochromosome 1, on the left) or out of order or inverted (e.g., forming the two forms of nanochromosome 2), in which case they need to be “unscrambled.” Two rounds of DNA amplification produce nanochromosomes at an average copy number of ∼1,900 . Alternative fragmentation of DNA during nanochromosome development may also occur, irrespective of unscrambling, giving rise to longer (2a) and shorter (2b) nanochromosome isoforms. The mature nanochromosomes are capped on both ends with telomeres.
Figure 2
Figure 2. Comparison of key ciliate macronuclear genomes.
The phylogeny represents the bootstrap consensus of 100 replicates from PhyML (with the HKY85 substitution model) based on a MUSCLE multiple sequence alignment of 18S rRNA genes from seven ciliates (Oxytricha trifallax—FJ545743; Stylonychia lemnae—AJJRB310497; Euplotes crassus—AJJRB310492; Nyctotherus ovalis—AJ222678; Tetrahymena thermophila—M10932; Ichthyophthirius multifiliis—IMU17354; and Paramecium tetraurelia—AB252009) rooted with two other alveolates (Perkinsus marinus—X75762 and Plasmodium falciparum—NC_004325). All bootstrap values are ≥80, except for the node between Nyctotherus and Oxytricha/Stylonychia/Euplotes, which has a boostrap value of 60. Euplotes and Nyctotherus both have nanochromosomes, like Oxytricha. Other than the genome statistics for Oxytricha trifallax, which were determined in this study, table statistics were obtained from the following sources: a - , b - ,, c - , d - , e - , f - (the number of chromosomes is an estimate), g -, h - , i - , j- (for a single stage of the Ichthyophthirius life cycle), k - , l - , m - . Table statistics for Perkinsus marinus are for the current assembly deposited in GenBank (GCA_000006405.1).
Figure 3
Figure 3. Key features of Oxytricha protein-coding nanochromosomes.
Representative nanochromosome features are not drawn to scale, but their lengths are indicated. UTR, untranslated region; UTS, untranscribed region. 3′ UTRs and the subtelomeric signal overlap. The subtelomeric base composition bias signal found on either end of the nanochromosome is shown above the nanochromosome diagram.
Figure 4
Figure 4. Nanochromosomal SNP heterozygosity.
The green histogram (left) corresponds to SNP heterozygosity estimated from mapped reads (see Materials and Methods) for “matchless” nanochromosomes (which have no non-self contig matches to the genome assembly) and includes homozygous nanochromosomes. The red histogram (right) corresponds to SNP heterozygosity estimated from mapped reads for “matched” nanochromosomes. The orange histogram (center) corresponds to SNP diversity assessed from pairwise alignments of matched nanochromosomes. The smallest bin is 0–0.005 (0%–0.5%) heterozygosity.
Figure 5
Figure 5. Nanochromosomal variant frequencies.
(A) Normalized to form a probability density (cumulative frequency of 1) and (B) unnormalized median nanochromosomal variant frequencies for six increasing ranges of mean SNP heterozygosity. Variant frequencies were determined for nanochromosomes with no non-self matches to the genome assembly (the same nanochromosomes underlying the SNP heterozygosity histogram for “matchless” nanochromosomes in Figure 4), with variant positions called at the same minimum variant frequency as that used to determine potentially heterozygous sites (5% for sites with ≥20× read coverage). To exclude potentially paralogous mapped reads, we only analyzed nanochromosomes with ≤4 reads mapped to other contigs (using all nanochromosomes does not substantially change the form of the distributions). Variant frequency bins are labeled by their lower bounds. Variant frequencies ≥40 bp from either nanochromosome end were counted to avoid possible incorrect variant calling resulting from telomeric bases that were not masked (due to sequencing errors).
Figure 6
Figure 6. Extreme nanochromosomal fragmentation.
Contig14329.0 is shown with coordinates in bp. Predicted genes, coding sequences, and introns are indicated by horizontal green, yellow, and white arrows, respectively. 5′ and 3′ fragmentation sites predicted from telomeric read pairs are indicated by red and navy arrows, respectively, with upward pointing solid arrows for sites predicted by 454 telomeric reads and downward pointing dashed arrows for sites predicted by Illumina telomeric reads. Numbers above/below the arrows indicate the number of telomeric reads found at each site for the two telomeric read sources. Alternative nanochromosome isoforms predicted from the 454 telomeric reads (isoforms B–O) are shown below the main locus, with the number of supporting read pairs next to each form. One additional isoform missed by our prediction method but documented in the 454 telomeric read pairs is indicated in pale green. Since the two fragmentation positions at 3,762 and 3,806 bp are in close proximity to each other, they were treated as a single point during alternative isoform prediction. Additional nanochromosome isoforms that were not detected by 454-telomeric reads, including the full eight-gene nanochromosome, but were detected by Southern blotting are indicated by stars (isoforms A, P, Q, and R). Sequence coverage, indicated by the cyan graph, shows the cumulative DNA amplification for all the nanochromosome isoforms. Sequence coverage is calculated from both Illumina telomereless and telomeric reads; telomeric read pairs appear as twin peaks ∼300 bp apart.
Figure 7
Figure 7. Nanochromosome copy number variation.
(A) Relative nanochromosome copy number distribution (number of telomere-less reads/bp of nonsubtelomeric nanochromosome; see Materials and Methods) for homozygous matchless, heterozygous matchless, and heterozygous matching nanochromosomes. The mean copy number of the combined homozygous matchless and heterozygous matchless nanochromosomes is indicated by a dashed line at 0.94, with dotted lines corresponding to an interval of ∼1.3σ (∼0.12 to 1.76) either side of the mean, which includes ∼90% of all nanochromosomes. (B) Relative nanochromosome copy number of nonalternatively fragmented (with a single, directional fragmentation site per nanochromosome) versus alternatively fragmented nanochromosomes measured by the number of telomeric reads per nanochromosome. (C) Relative nanochromosome copy number of nonalternatively fragmented chromosomes versus nonalternatively fragmented chromosomes encoding ribosomal proteins and tRNAs.
Figure 8
Figure 8. Length distributions of alternatively and nonalternatively fragmented nanochromosomes.
The shortest nanochromosome isoforms produced from single (directional) alternative fragmentation sites are labeled as “Short isoform.” The histograms show normalized frequencies for 1,587 alternatively fragmented nanochromosomes and 15,219 nonalternatively fragmented nanochromosomes. Alternatively fragmented nanochromosomes have at least one strongly supported (≥10 Illumina reads) alternative fragmentation site >250 bp from either end of the nanochromosome (these nanochromosomes are >500 bp long).
Figure 9
Figure 9. Transposase-like domains of proteins found in Oxytricha but neither Paramecium nor Tetrahymena.
Proteins are shown with black lines with a scale in amino acids indicated above the longest protein. Protein names are to the left of the protein diagrams. Domain coordinates are the Pfam domain envelope coordinates. Representative domains are given their Pfam names, with transposase-like domains shown in bold. Gene expression levels are log2[10,000×normalized RNA-seq counts (see Text S1; Supporting Materials and Methods) divided by CDS length (in bp)] before (“fed”) and during conjugation (0–60 h).
Figure 10
Figure 10. Telomere end-binding protein-α paralogs in ciliates.
The phylogeny is an ML tree generated by PhyML with a single substitution rate category and the JTT substitution model, optimized for tree topology and branch length. Bootstrap percentages for 1,000 replicates are indicated at the tree nodes. The multiple sequence alignments underlying the phylogeny were produced with MAFFT (v 6.418b [124]) (default parameters; BLOSUM 62 substitution matrix) and were trimmed with trimal1.2 with the “-automated1” parameter to remove excess gaps and poorly aligned regions. GenBank accessions are provided for the taxa unless otherwise indicated. Euplotes crassus is indicated in blue (Q06184 and Q06183), and an additional match from our preliminary Euplotes genome assembly is EUP_contig393834_f1_1. Perkinsus marinus is purple (EER00428) and Oxytricha nova is light green (P29549). Tetrahymena thermophila (salmon color) accessions are from the Tetrahymena genome database —TTHERM_00378980 and TTHERM_00378990; Paramecium tetraurelia's TeBP-α protein (pink) is from ParameciumDB (GSPATP00001065001). All the nodes beginning with “Contig” are Oxytricha trifallax TeBP-α paralogs (dark green) and Contig22209.0.g66 is TeBP-α1, the original TeBP-α. The tree is rooted at the midpoint of the branch between Arabidopsis thaliana (Pot1a—AAX78213 and Pot1b—AAS99712) and Homo sapiens (Pot1—EAW83616; black) and the rest of the phylogeny. Gene expression levels are normalized RNA-seq counts (see Text S1; Supporting Materials and Methods) before (“fed”) and during conjugation (0–60 h) are shown for the Oxytricha trifallax TeBP-α paralogs; coding sequence lengths are also indicated (in bp) for each of these paralogs.

Comment in

Similar articles

See all similar articles

Cited by 71 PubMed Central articles

See all "Cited by" articles

References

    1. Zoller SD, Hammersmith RL, Swart EC, Higgins BP, Doak TG, et al. (2012) Characterization and taxonomic validity of the ciliate Oxytricha trifallax (class spirotrichea) based on multiple gene sequences: limitations in identifying genera solely by morphology. Protist 163 (4) 643–657. - PMC - PubMed
    1. Prescott DM (1994) The DNA of ciliated protozoa. Microbiol Rev 58: 233–267. - PMC - PubMed
    1. Ammermann D, Steinbruck G, von Berger L, Hennig W (1974) The development of the macronucleus in the ciliated protozoan Stylonychia mytilus. Chromosoma 45: 401–429. - PubMed
    1. Coyne RS, Stover NA, Miao W (2012) Whole genome studies of Tetrahymena. Methods in cell biology 109: : 53–81. - PubMed
    1. Arnaiz O, Mathy N, Baudry C, Malinsky S, Aury JM, et al. (2012) The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences. PLoS Genet 8: e1002984 doi:10.1371/journal.pgen.1002984. - DOI - PMC - PubMed

Publication types

Associated data

Feedback