Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 11:8:e52542.
doi: 10.7554/eLife.52542.

The DNA-binding protein HTa from Thermoplasma acidophilum is an archaeal histone analog

Affiliations

The DNA-binding protein HTa from Thermoplasma acidophilum is an archaeal histone analog

Antoine Hocher et al. Elife. .

Abstract

Histones are a principal constituent of chromatin in eukaryotes and fundamental to our understanding of eukaryotic gene regulation. In archaea, histones are widespread but not universal: several lineages have lost histone genes. What prompted or facilitated these losses and how archaea without histones organize their chromatin remains largely unknown. Here, we elucidate primary chromatin architecture in an archaeon without histones, Thermoplasma acidophilum, which harbors a HU family protein (HTa) that protects part of the genome from micrococcal nuclease digestion. Charting HTa-based chromatin architecture in vitro, in vivo and in an HTa-expressing E. coli strain, we present evidence that HTa is an archaeal histone analog. HTa preferentially binds to GC-rich sequences, exhibits invariant positioning throughout the growth cycle, and shows archaeal histone-like oligomerization behavior. Our results suggest that HTa, a DNA-binding protein of bacterial origin, has converged onto an architectural role filled by histones in other archaea.

Keywords: Methanothermus fervidus; Thermoplasma acidophilum; archaea; chromatin; chromosomes; convergent evolution; evolutionary biology; gene expression; histones.

PubMed Disclaimer

Conflict of interest statement

AH, MR, JS, AE, TW No competing interests declared

Figures

Figure 1.
Figure 1.. Predicted structure and measured abundance of HTa.
(a) Predicted secondary structures of HTa (T. acidophilum), the bacterial HU protein HupA (E. coli), and the archaeal histone protein HmfA (M. fervidus). (b) Predicted quaternary structure of the (HTa)2 homodimer compared to the crystal structure of (HupA)2 (PDB: 1p51) bound to DNA. Color gradients represent charge densities mapped onto the solvent accessible surface area of (HTa)2 and (HupA)2. Note the extended patches of stronger positive charge for (HTa)2 compared to (HupA)2, particularly in the stalk region. (c) Correlation of transcript and protein abundances for T. acidophilum and E. coli. HTa and HU are highlighted along with some additional chromatin-associated proteins. Data sources: T. acidophilum protein abundance: Sun et al. (2010); E. coli protein abundance: Lu et al. (2007). E. coli transcript abundance is an average across three previous studies as reported by Lu et al. (2007).
Figure 2.
Figure 2.. Phylogenetic relationships of HU family proteins from bacteria, eukaryotes, and archaea.
(a) Protein-level phylogenetic tree of HU proteins including HTa (see Materials and methods for details on phylogenetic reconstruction). The tree is midpoint-rooted. Reported domain-level membership (Bacteria, Archaea, etc.) of different proteins is color-coded in the outer circle and on the dotted lines that point to individual branches. See main text and Materials and methods for a critical evaluation of domain assignments and likely assembly contaminants. Bootstrap support values (%) for individual branches, visually encoded as node diameters, illustrate poorly resolved relationships at deeper nodes. (b) Excerpt of the phylogeny shown above, highlighting good support (84%) for a monophyletic origin of HU proteins in the Thermoplasmatales/DHVE2 clade and their uncertain affiliation to other HU family members.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Phylogenetic placement of HU proteins attributed to halophilic archaea.
The phylogenetic tree shown is an excerpt of the protein-level HU family tree shown in Figure 2, focussing on sequences from halophilic archaea (orange), which cluster mainly with sequences of bacteria from the phylum Bacteroidetes (gray). As is true for the majority of the HU protein tree, deeper ancestral relationships are poorly resolved.
Figure 3.
Figure 3.. HTa-mediated primary chromatin architecture in T. acidophilum mapped by MNase-Seq.
(a) Growth curve of T. acidophilum as determined using optical density (OD600). Time points used for downstream experiments are indicated (means and ± SEM across four biological replicates). (b) Agarose gel of MNase digestion products from T. acidophilum sampled across the growth cycle. Growth phases are given as days after inoculation, digestion time in minutes. (c) Agarose gel of MNase digestion products from T. acidophilum (day 2) along with digestion products of E. coli ectopically expressing HTa, HupA or YFP (see Materials and methods). (d) Distribution of the lengths of fragments mapped to the T. acidophilum genome (pooled across all four replicates from day 2), highlighting fragment size ranges that correspond to small (blue) and large (red) fragments, as defined in the main text. (e) Correlation matrix comparing genome-wide MNase-Seq coverage signal, computed at base-pair resolution, between reads of defined sizes (pooled replicates, day 2). (f) Genome-wide MNase-Seq signal prior to and after normalization with sonicated DNA input (see Materials and methods), along with GC content profile along the T. acidophilum chromosome, computed using a 51 bp moving window. (g) Example of coverage and called peaks across a 10 kb region of the T. acidophilum chromosome. (h) Overlap of detected narrow and broad peaks across the growth cycle. Note that different sections/overlaps are only qualitatively but not quantitatively proportional to absolute peak numbers.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Agarose gel (3%) of MNase digestion products from T. acidophilum (day 2) along with digestion products of E. coli ectopically expressing either HTa, HupA, YFP, HupA (E38K,V42L), HU from T. composti or HU from L. floricola, from the same plasmid backbone.
HupA (E38K,V42L) is a mutant that had previously been shown to induce extreme compaction of the E. coli nucleoid (Kar et al., 2005).
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Distribution of the lengths of fragments mapped to the T. acidophilum genome for all replicates across the growth cycle.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Heat maps indicating MNase-seq coverage by fragment length relative to the center of broad peaks in T. acidophilum, for the same sample (day1, replicate 3), digested for either 15 or 30 min.
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Multiscale analysis of MNase signal.
(a) Chromosome-wide MNase-Seq coverage along the T. acidophilum chromosome (day2, replicate 2), normalized using sonicated DNA to remove replication-associated coverage bias. (b) Multiscale analysis of MNase signal enrichment (see Materials and methods). Significantly enriched or depleted (p-value<1.e-15) segments are color-coded red and blue, respectively. Scales correspond to increasing window sizes over which enrichment is computed. (c) Enrichment signal of significantly MNase-signal-enriched or -depleted genomic domains at scale 30 as a function GC content. (d) Normalized transcript levels for pooled depleted or enriched domains at scale 30 and (e) corresponding log2-fold changes in transcript levels.
Figure 4.
Figure 4.. Asymmetric coverage signals around peaks in T. acidophilum and M. fervidus that track underlying nucleotide content.
(a) Empirical example and (b) schematic describing our approach to re-orienting coverage signals at broad peaks based on the coverage of small fragments around the dyad axis. (c, d) Heat maps illustrating MNase-seq coverage by fragment length relative to the center of narrow and broad peaks in T. acidophilum. Coverage around broad peaks is oriented as explained in (b). (e) Analogous heat map illustrating coverage by fragment length relative to the center of large peaks (corresponding to the binding footprints of octameric histone oligomers) in M. fervidus. (f, g, h) Normalized coverage for T. acidophilum small (40–65 bp) and large (70–100 bp) fragments and M. fervidus fragment ranges corresponding to the expected footprint sizes of histone tetramers, hexamers, and octamers. (i, j, k) Proportion of SS (=CC|CG|GC|GG) and WW (=AA|AT|TA|TT) dinucleotides at the same relative positions as (c, d, e). Dotted lines indicate the proportion of SS or WW dinucleotides expected by chance, estimated via random sampling of 25000 regions of equal size in each genome.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Weblogos of bitscores and nucleotide occurrence probabilities at (a) narrow and (b) broad peaks detected during exponential phase in T. acidophilum.
Information content is so low that the bitscore plots appear empty when using the common 0–2 bit visualization range. Logos are only visible when zooming in on the 0–0.02 range.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Normalized MNase-Seq coverage relative to the center of narrow peaks oriented according to the abundance of (a) 87–97 bp fragments in M. fervidus and (b) 70–100 bp fragments in T. acidophilum.
Middle and right panels are focused on peaks where 87–97 bp (70–100 bp) fragments are common or rare, respectively. Lower panels display the proportion of SS (=CC|CG|GC|GG) and WW (=AA|AT|TA|TT) dinucleotides at locations matching the upper panels. Dotted lines indicate the proportion of SS or WW dinucleotides expected by chance, estimated by randomly sampling 25000 regions per genome. (c) AT content in the flanks of narrow peaks (defined across the two windows 25–50 bp either side of the center of the peak) is higher at peaks where large fragments are rare (t-test, ***p<2.2×10−16).
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. As in Figure 4—figure supplement 2 but for 87–97 bp peaks scored according to 117–127 bp fragments and oriented according to 60–70 bp fragments.
Note the increase in WW content flanking the smaller-sized peaks that do not get extended further.
Figure 5.
Figure 5.. Comparison and predictive power of nucleotide enrichment patterns associated with HTa and archaeal histones.
(a) Proportion of SS (=CC|CG|GC|GG) dinucleotides, (b) A|T mononucleotides, and (c) RR (=purine/purine)|YY (=pyrimidine/pyrimidine) dinucleotides relative to the centers of reads of defined length in different archaeal species (see Materials and methods for read filtering). (d) Density plot comparing observed (day 2, replicate 2) and predicted MNase-Seq coverage across the part of the T. acidophilum chromosome not used for training. (e) Correlation between MNase-seq coverage and individual DNA k-mers with particularly high positive or negative correlation coefficients, as observed in the training data. Overall correlations between measured MNase-Seq coverage and coverage predicted by the LASSO model, for both trained and untrained data, are shown on the right-hand side. (f) Proportion of SS dinucleotides relative to the centers of 50 bp reads from digests of T. acidophilum genomic DNA, E. coli expressing HTa, and E. coli genomic DNA. (g) Genome-wide correlation of normalized occupancy between T. acidophilum genomic DNA and native chromatin digests.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Proportion of SS (=CC|CG|GC|GG) dinucleotides relative to the centers of reads of defined length (41–53 bp) in T. acidophilum.
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. Predicting in vivo HTa occupancy.
(a) In vivo occupancy in T. acidophilum is poorly predicted by a Lasso model trained on a T. acidophilum naked DNA digest (rho = 0.07, p<2.2×10−16). (b) In contrast, in vivo occupancy in T. acidophilum is well predicted by a Lasso model trained on digestion fragments from HTa-expressing E. coli (rho = 0.54, p<2.2×10−16). All correlations/predictions are for short fragments.
Figure 6.
Figure 6.. In vitro experiments to assess HTa binding preferences.
(a) Occupancy of small fragments across the T. acidophilum genome in vivo (day 2) correlates with occupancy following in vitro reconstitution and with (b) occupancy predicted by a Lasso model trained on the in vitro data. (c) EMSAs on libraries of sequence-variable dsDNA oligomers (see main text) in the presence of increasing amounts of HTa. (d) Independent reactions at a HTa:DNA ratio of 0.2 yield highly reproducible band shift patterns. (e) Pslow varies as a function of oligo G+C content and (f) GpC dinucleotide content. Point sizes are scaled according to the relative abundance of reads of a given G+C (GpC) content across the sequenced bands. The absolute number of reads analyzed is given in the panel above. Correlation coefficients (r) are from Pearson correlations between G+C (GpC) content and Pslow weighted by the number of reads at each G+C (GpC) content.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. In vitro reconstitution of HTa:DNA nucleoprotein complexes.
(a) 16% TTS protein gel (Biorad) showing different concentrations of BSA (Biorad) and purified untagged HTa. (b) Bioanalyzer trace of in vitro chromatin reconstitution. Two replicates are superimposed. Major peaks are evident at 50 bp and ~ 90 bp in both replicates. (c) Distribution of the lengths of fragments from digested in vitro reconstitutions mapped to the T. acidophilum genome. Note that smaller fragments are much rarer than in (b). We believe this is likely the consequence of preferential amplification of larger fragments during sequencing library preparation. As we sequence to sufficient depth, however, we retain ample read coverage at smaller fragment sizes.
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. EMSA backbone sequences.
(a) In vivo occupancy (day 2) at five 100 bp regions detailed in (b) is correlated with in vitro occupancy. Randomized dinucleotides are highlighted in green. (c) The proportion Pslow of diversified oligos associated with a given oligo backbone in (b) recovered from the HTa-bound slow band (see Figure 6e). Values in (a) and (c) are based on pooled read data from two independent EMSA experiments. Trends visualized here are also observed for both replicates individually.
Figure 6—figure supplement 3.
Figure 6—figure supplement 3.. The relationship between GC content of an oligo and Pslow.
Only oligos represented by at least 200 sequenced reads are considered. This analysis shows that results in Figure 6e are not driven by few highly abundant oligos but represent the cumulative effect of different oligos acting in the same direction.
Figure 6—figure supplement 4.
Figure 6—figure supplement 4.. The relationship between GpC dinucleotide content of an oligo and Pslow.
Only oligos represented by at least 200 sequenced reads are considered. This analysis shows that results in Figure 6f are not driven by few highly abundant oligos but represent the cumulative effect of different oligos acting in the same direction.
Figure 7.
Figure 7.. Broad peaks are associated with heterogeneous GC content in exponential but not stationary phase.
(a) Average GC content at broad peaks (day 2), separated into deciles based on the relative abundance of small fragments and (b) the corresponding relative coverage for large and small fragments during exponential and stationary phase. For all graphs, decile decomposition is based on small fragment occupancy during exponential phase (day 2).
Figure 7—figure supplement 1.
Figure 7—figure supplement 1.. Small fragment abundance at narrow peaks.
(a) Average GC content at narrow peaks (day 2), separated into deciles based on the relative abundance of small fragments. (b) corresponding relative coverage for large and small fragments during exponential and stationary phase. (c) Percentage of overlap between narrow peaks and intergenic regions. For all graphs, decile decomposition is based on small fragment occupancy during exponential phase (day 2).
Figure 8.
Figure 8.. MNase-Seq coverage around transcriptional start sites in T. acidophilum and histone-encoding archaea in the context of dynamic transcription.
(a) Broad peaks associated with low abundance of small fragments are enriched in intergenic regions. (b) Left and central panel: Heat maps indicating MNase-seq coverage by fragment length relative to transcriptional start sites in exponential (day 2) and stationary phase (day 3.5). Right panel: median normalized MNase-seq coverage (considering all fragment sizes) as a function of distance from the transcriptional start site (TSS). (c) as in (b) but for M. fervidus and using the coding start (ATG) rather than the TSS as a reference point. To ensure that the coding start constitutes a reasonable proxy for the TSS, only genes with a divergently oriented neighboring gene are considered, thus eliminating genes internal to operons. (d, e) median of normalized MNase-seq coverage (considering all fragment sizes) as a function of distance from the TSS in T. kodakarensis and Haloferax volcanii. (f) Changes in normalized MNase-seq coverage for small and large fragments around transcriptional start sites in T. acidophilum as a function of growth phase and whether genes are upregulated, downregulated or remain unchanged relative to mRNA abundance on day 1. Genes are grouped according to differential expression (or lack thereof) on day two compared to day 1. Genes with a log2-fold change > 1 were considered significantly upregulated, those with a log2-fold change <-1 significantly down-regulated (FDR < 0.01). The rightmost panels indicate that a majority of genes up-/downregulated on day 2, remain up-/downregulated on days 3 and 3.5.
Figure 8—figure supplement 1.
Figure 8—figure supplement 1.. HTa and histone occupancy around transcription end sites.
(a) Median normalized MNase-seq coverage across fragment sizes relative to the distance from TESs or stop codons in different species. To ensure that the stop codons constitute a reasonable proxy for the TES, only genes with a convergently oriented downstream neighboring gene are considered, thus eliminating genes internal to operons. (b) Heat maps displaying normalized MNase-seq coverage at divergent genes relative to the distance from the start codon (ATG) or TSS in different species. Intergenic regions are sorted according to their width.
Author response image 1.
Author response image 1.. Nucleotide periodicities in the T. acidophilum genome.

Similar articles

Cited by

References

    1. Adam PS, Borrel G, Brochier-Armanet C, Gribaldo S. The growing tree of archaea: new perspectives on their diversity, evolution and ecology. The ISME Journal. 2017;11:2407–2425. doi: 10.1038/ismej.2017.122. - DOI - PMC - PubMed
    1. Ali Azam T, Iwata A, Nishimura A, Ueda S, Ishihama A. Growth Phase-Dependent variation in protein composition of the Escherichia coli nucleoid. Journal of Bacteriology. 1999;181:6361–6370. - PMC - PubMed
    1. Allan J, Fraser RM, Owen-Hughes T, Keszenman-Pereyra D. Micrococcal nuclease does not substantially Bias nucleosome mapping. Journal of Molecular Biology. 2012;417:152–164. doi: 10.1016/j.jmb.2012.01.043. - DOI - PMC - PubMed
    1. Ammar R, Torti D, Tsui K, Gebbia M, Durbic T, Bader GD, Giaever G, Nislow C. Chromatin is an ancient innovation conserved between archaea and eukarya. eLife. 2011;1:e00078. doi: 10.7554/eLife.00078. - DOI - PMC - PubMed
    1. Babski J, Haas KA, Näther-Schindler D, Pfeiffer F, Förstner KU, Hammelmann M, Hilker R, Becker A, Sharma CM, Marchfelder A, Soppa J. Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-Seq (dRNA-Seq) BMC Genomics. 2016;17:629. doi: 10.1186/s12864-016-2920-y. - DOI - PMC - PubMed

Publication types