Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 3;11(1):3320.
doi: 10.1038/s41467-020-17191-8.

The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms

Affiliations

The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms

Cristina Maria Osuna-Cruz et al. Nat Commun. .

Erratum in

Abstract

Benthic diatoms are the main primary producers in shallow freshwater and coastal environments, fulfilling important ecological functions such as nutrient cycling and sediment stabilization. However, little is known about their evolutionary adaptations to these highly structured but heterogeneous environments. Here, we report a reference genome for the marine biofilm-forming diatom Seminavis robusta, showing that gene family expansions are responsible for a quarter of all 36,254 protein-coding genes. Tandem duplications play a key role in extending the repertoire of specific gene functions, including light and oxygen sensing, which are probably central for its adaptation to benthic habitats. Genes differentially expressed during interactions with bacteria are strongly conserved in other benthic diatoms while many species-specific genes are strongly upregulated during sexual reproduction. Combined with re-sequencing data from 48 strains, our results offer insights into the genetic diversity and gene functions in benthic diatoms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genome properties for S. robusta and comparison with other sequenced diatoms.
a Summary of the S. robusta genome assembly and gene annotation statistics. b Scatter plot showing genome assembly contiguity and gene family completeness score in sequenced diatom genomes. Every dot represents a diatom genome assembly. The x axis displays the genome size in Mb, whereas the y axis represents the number of protein-coding genes. Genome assemblies are colored according to the gene family completeness score in a rainbow scale from blue to red. The size of the circle indicates the number of scaffolds in the genome assembly. c Comparative genomics analysis among diatoms and other eukaryote species. Left side of the barplot represents the age of the genes inferred through phylostratification, whereas the right side represents the duplication information. The phylogenetic relationship between diatom species is shown in a cladogram.
Fig. 2
Fig. 2. Species-specific and shared gene family expansions in diatoms.
a Upset plot showing the intersection of gene family expansions in diatoms. Each row represents a diatom species, reporting the total number of expanded gene families within parenthesis. Black circles and vertical lines between the rows represent the intersection of expanded families between species. The barplot indicates the total gene family count in each intersection, displaying only intersections that contain at least ten gene families. Diatoms with a genome size > 90 Mb are highlighted in bold. b Examples of species-specific and shared gene family expansions in S. robusta. Each column represents a diatom species, and each row a given gene family showing expansion in S. robusta, indicating the total number of genes in S. robusta in parenthesis and matching the font color with the intersection subset in panel a. The size of the circles is proportional to the number of genes falling under the given gene family per species, whereas the color of the circles indicates if the gene family is significantly tandem-enriched. Source data are provided as a Source Data file. Numbers in superscript refer to families annotated in this Source Data file.
Fig. 3
Fig. 3. Expression analysis for S. robusta multi-copy gene families.
a Expression divergence trend for multi-copy S. robusta families (n = 4444 families). The y axis denotes the percentage of nodes showing expression divergence in the phylogenetic tree of the family, while the x axis represents the number of S. robusta gene copies in the family. Average expression divergence percentages are indicated by red dots. Median expression divergence values significantly higher than the median of all nodes are highlighted with a star (P-value < 0.05, Wilcoxon rank-sum test, two-sided). b Heatmap showing pleiotropic families significantly enriched in upregulated genes for more than seven different conditions. The x axis represents the different conditions/experiments, whereas the y axis reports the families. The significance of the upregulation in a certain condition for a family is shown in –log10(q-value) scale highlighted by a color gradient from gray to dark purple. Expansion and tandem enrichment of each family are highlighted in different colors on the right side of the heatmap. c Barplot showing family counts with significant condition-specific expression. The x axis represents the different conditions/experiments, whereas the y axis represents the number of families having significant expression bias for that condition. The color of the bars denotes the family age distribution. d Network showing families with significant specific expression in the three reproduction stages available. Families are represented with circles, while conditions are represented with diamonds. The color of the circles denotes the family age following the same color code as panel c. The edge’s width denotes the fraction of genes in the family that shows upregulation for the given condition, while the edge’s color represents the significance of the enrichment, following the same color code as panel b from gray to dark purple. Expansion and tandem enrichment of each family are indicated by the squares next to the gene family circles, also following the same color code as panel b. Source data are provided as a Source Data file. Numbers in superscript refer to families annotated in this Source Data file.
Fig. 4
Fig. 4. S. robusta within-species variability using a gene-based pan-genome analysis.
a Representation of reference, core and pan gene size. The size of pan genome increases with each added strain up to 37,803 protein-coding genes, whereas the size of core genome diminishes to 28,120 protein-coding genes. Clade category color code refers to the population groups described in ref. . b Number of core and dispensable genes per S. robusta strain. The pie chart shows the total gene count, where core genes are genes present in all strains, dispensable genes are genes present in a subset of strains. c Percentage of gene length coverage by short read for all pan genes for each strain. The x axis represents the S. robusta strains, whereas the y axis represents all protein-coding pan genes. The percentage of horizontal gene coverage is highlighted by a color gradient from white (0%) to dark purple (100%). Gene categories are labeled on the right side of the y axis following the color code of panel b, whereas clade categories are labeled on the upper part of x axis following the color code of panel a. d Set of gene families that are significantly enriched in core genes. The x axis represents the percentage of protein-coding pan genes that are core or dispensable, following the color code of panel b, while the y axis represents gene families, denoting in parenthesis the total number of pan genes belonging to that gene family (reference and de novo genes). Expansion, tandem enrichment, and age of each family are highlighted in different colors on the right side of the y axis. Numbers in superscript refer to families annotated in Source Data file from Fig. 3. Source data underlying Fig. 4a, b are provided as a Source Data file.
Fig. 5
Fig. 5. Selection of signature genes showing strong clade-specific conservation.
Significant enrichment for differential expression in the S. robusta transcriptome of genes showing the specified signature is highlighted with downward (for downregulation) or upward (for upregulation) arrows colored by experiment. Each row is a protein domain, the number of genes showing the signature, and having that protein domain is indicated in parenthesis. If any of the genes containing one of the highlighted protein domains belong to an expanded and/or tandem-enriched family, this is encoded by the size and color of the circles. The fill of the circles indicates if all or some genes with a given protein domain are upregulated during bacterial interaction experiments. The average pennate/raphid/benthic signature per protein domain is highlighted by a color gradient from dark gray (−6) to dark green (6). A selection of genes with high pennate signature is shown in panel (a), which high raphid signature in panel (b) and with high benthic signature in panel (c). Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Stockdale A, Davison W, Zhang H. Micro-scale biogeochemical heterogeneity in sediments: a review of available technology and observed evidence. Earth-Sci. Rev. 2009;29:81–97.
    1. Admiraal W. The ecology of estuarine sediment inhabiting diatoms. Prog. Phycological Res. 1984;3:269–322.
    1. Stal LJ, Bolhuis H, Cretoiu MS. Phototrophic marine benthic microbiomes: the ecophysiology of these biological entities. Environ. Microbiol. 2019;21:1529–1551. - PubMed
    1. Malviya S, et al. Insights into global diatom distribution and diversity in the world’s ocean. Proc. Natl Acad. Sci. USA. 2016;113:E1516–E1525. - PMC - PubMed
    1. Round, F. E., Crawford, R. M. & Mann, D. G. The Diatoms: Biology and Morphology of the Genera (Cambridge University Press, 1990).

Publication types