Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5

The Genome of the Crustacean Parhyale hawaiensis, a Model for Animal Development, Regeneration, Immunity and Lignocellulose Digestion

Affiliations

The Genome of the Crustacean Parhyale hawaiensis, a Model for Animal Development, Regeneration, Immunity and Lignocellulose Digestion

Damian Kao et al. Elife.

Abstract

The amphipod crustacean Parhyale hawaiensis is a blossoming model system for studies of developmental mechanisms and more recently regeneration. We have sequenced the genome allowing annotation of all key signaling pathways, transcription factors, and non-coding RNAs that will enhance ongoing functional studies. Parhyale is a member of the Malacostraca clade, which includes crustacean food crop species. We analysed the immunity related genes of Parhyale as an important comparative system for these species, where immunity related aquaculture problems have increased as farming has intensified. We also find that Parhyale and other species within Multicrustacea contain the enzyme sets necessary to perform lignocellulose digestion ('wood eating'), suggesting this ability may predate the diversification of this lineage. Our data provide an essential resource for further development of Parhyale as an experimental model. The first malacostracan genome will underpin ongoing comparative work in food crop species and research investigating lignocellulose as an energy source.

Keywords: Parhyale; crustacean; developmental biology; epigenetics; evolutionary biology; genome; genomics; immunity; lignocellulose; stem cells.

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

Figure 1.
Figure 1.. Introduction.
(A) Phylogenetic relationship of Arthropods showing the Chelicerata as an outgroup to Mandibulata and the Pancrustacea clade which includes crustaceans and insects. Species listed for each clade have ongoing or complete genomes. Species include Crustacea: Parhyale hawaiensis, D. pulex; Hexapoda: Drosophila melanogaster, Apis mellifera, Bombyx mori, Aedis aegypti, Tribolium castaneum; Myriapoda: Strigamia maritima, Trigoniulus corallines; Chelicerata: Ixodes scapularis, Tetranychus urticae, Mesobuthus martensii, Stegodyphus mimosarum. (B) One of the unresolved issues concerns the placement of the Branchiopoda either together with the Cephalocarida, Remipedia and Hexapoda (Allotriocarida hypothesis A) or with the Copepoda, Thecostraca and Malacostraca (Vericrustacea hypothesis B). (C) Life cycle of Parhyale that takes about two months at 26C. Parhyale is a direct developer and a sexually dimorphic species. The fertilized egg undergoes stereotyped total cleavages and each blastomere becomes committed to a particular germ layer already at the 8-cell stage depicted in (D). The three macromeres Er, El, and Ep give rise to the anterior right, anterior left, and posterior ectoderm, respectively, while the fourth macromere Mav gives rise to the visceral mesoderm and anterior head somatic mesoderm. Among the 4 micromeres, the mr and ml micromeres give rise to the right and left somatic trunk mesoderm, en gives rise to the endoderm, and g gives rise to the germline. DOI: http://dx.doi.org/10.7554/eLife.20062.003
Figure 2.
Figure 2.. Parhyale karyotype.
(A) Frequency of the number of chromosomes observed in 42 mitotic spreads. Forty-six chromosomes were observed in more than half of all preparations. (B) Representative image of Hoechst-stained chromosomes. DOI: http://dx.doi.org/10.7554/eLife.20062.005
Figure 3.
Figure 3.. Parhyale genome assembly metrics.
(A) K-mer frequency spectra of all reads for k-lengths ranging from 20 to 50. (B) K-mer branching analysis showing the frequency of k-mer branches classified as variants compared to Homo sapiens (human), Crassostrea gigas (oyster), and Saccharomyces cerevisiae (yeast). (C) K-mer branching analysis showing the frequency of k-mer branches classified as repetitive compared to H. sapiens, C. gigas and S. cerevisiae. (D) Histogram of read coverages of assembled contigs. (E) The number of contigs with an identity ranging from 70–95% to another contig in the set of assembled contigs. (F) Collapsed contigs (green) are contigs with at least 95% identity with a longer primary contig (red). These contigs were removed prior to scaffolding and added back as potential heterozygous contigs after scaffolding. DOI: http://dx.doi.org/10.7554/eLife.20062.006
Figure 4.
Figure 4.. Workflows of assembly, annotation, and proteome generation.
(A) Flowchart of the genome assembly. Two shotgun libraries and four mate-pair libraries with the indicated average sizes were prepared from a single male animal and sequenced to a predicted depth of 115x coverage after read filtering, based on a predicted size of 3.6 Gbp. Contigs were assembled at two different k-lengths with Abyss and the two assemblies were merged with GAM-NGS. Filtered contigs were scaffolded with SSPACE. (B) The final scaffolded assembly was annotated with a combination of Evidence Modeler to generate 847 high quality gene models and Augustus for the final set of 28,155 predictions. These protein-coding gene models were generated based on a Parhyale transcriptome consolidated from multiple developmental stages and conditions, their homology to the species indicated, and ab initio predictions with GeneMark and SNAP. (C) The Parhyale proteome contains 28,666 entries based on the consolidated transcriptome and gene predictions. The transcriptome contains 292,924 coding and non-coding RNAs, 96% of which could be mapped to the assembled genome. DOI: http://dx.doi.org/10.7554/eLife.20062.007
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. CEGMA assessment of Parhyale transcriptome and genome.
(A) CEGMA genes present in the transcriptome assembly scored by BLAST identity (y axis) and proportion of coverage (relative length, x axis) (B) CEGMA genes present in the genome assembly scored by BLAST identity (y axis) and proportion of coverage (relative length, x axis). In this analysis coverage reduced. DOI: http://dx.doi.org/10.7554/eLife.20062.010
Figure 5.
Figure 5.. Parhyale genome comparisons.
(A) Box plots comparing gene sizes between Parhyale and humans (H. sapiens), water fleas (D. pulex), flies (D. melanogaster) and nematodes (C. elegans). Ratios were calculated by dividing the size of the top blast hit in each species with the corresponding Parhyale gene size. (B) Box plots showing the distribution of intron sizes in the same species used in A. (C) Comparison between Parhyale and representative proteomes from the indicated animal taxa. Colored bars indicate the number of blast hits recovered across various thresholds of E-values. The top hit value represents the number of proteins with a top hit corresponding to the respective species. (D) Cladogram showing the number of shared orthologous protein groups at various taxonomic levels, as well as the number of clade-specific groups. A total of 123,341 orthogroups were identified with Orthofinder across the 16 genomes used in this analysis. Within Pancrustacea, 37 orthogroups were shared between Branchiopoda and Hexapoda (supporting the Allotriocarida hypothesis) and 49 orthogroups were shared between Branchiopoda and Amphipoda (supporting the Vericrustacea hypothesis). DOI: http://dx.doi.org/10.7554/eLife.20062.012
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Expanded gene families in Parhyale.
Histograms showing number of paralogs in each listed species for (A) sidestep, (B) lachesin, (C) neurotrimin/DPR, (D) APN and (E) cathepsin genes for gene families over represented in Parhyale. DOI: http://dx.doi.org/10.7554/eLife.20062.016
Figure 6.
Figure 6.. Variation analyses of predicted genes.
(A) A read coverage histogram of predicted genes. Reads were first mapped to the genome, then coverage was calculated for transcribed regions of each defined locus. (B) A coverage distribution plot showing that genes in the lower coverage region (<105x coverage, peak at 75x ) have a higher level of heterozygosity than genes in the higher coverage region (>105 coverage and <250, peak at approximately 150x coverage). (C) Distribution plot indicating that mean level of population variance is similar for genes in the higher and lower coverage regions. DOI: http://dx.doi.org/10.7554/eLife.20062.017
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Confirmation of polymorphisms in the wider laboratory population of Parhyale.
(A) An example of laboratory population polymorphism in exon 1 of the gene aristalless. As well as heterozygoisty in the single Chicago-F male sequenced (pink and purple bases) there is additional polymorphism detectable in the transcriptome (green bases) (B) Further examples of polymorphism in the laboratory population in 5 developmental genes. DOI: http://dx.doi.org/10.7554/eLife.20062.019
Figure 7.
Figure 7.. Variation observed in contiguous BAC sequences.
(A) Schematic diagram of the contiguous BAC clones tiling across the HOX cluster and their% sequence identities. 'Overlap length' refers to the lengths (bp) of the overlapping regions between two BAC clones. 'BAC supported single nucleotide polymorphisms (SNPs)' refer to the number of SNPs found in the overlapping regions by pairwise alignment.'Genomic reads supported SNPs' refer to the number of SNPs identified in the overlapping regions by mapping all reads to the BAC clones and performing variant calling with GATK. 'BAC + Genomic reads supported SNPs' refer to the number of SNPs identified from the overlapping regions by pairwise alignment that are supported by reads. 'Third allele' refers to presence of an additional polymorphism not detected by genomic reads. 'Number of INDELs' refer to the number of all insertion or deletions found in the contiguous region. 'Number of INDELs >100' are insertion or deletions greater than or equal to 100. (B) Position versus indel lengths across each overlapping BAC region. DOI: http://dx.doi.org/10.7554/eLife.20062.021
Figure 8.
Figure 8.. Comparison of Wnt family members across Metazoa.
Comparison of Wnt genes across Metazoa. Tree on the left illustrates the phylogenetic relationships of species used. Dotted lines in the phylogenetic tree illustrate the alternative hypothesis of Branchiopoda + Hexapoda versus Branchiopoda + Multicrustacea. Colour boxes indicate the presence of certain Wnt subfamily members (wnt1 to wnt11, wnt16 and wntA) in each species. Empty boxes indicate the loss of particular Wnt genes. Two overlapping colour boxes represent duplicated Wnt genes. DOI: http://dx.doi.org/10.7554/eLife.20062.022
Figure 8—figure supplement 1.
Figure 8—figure supplement 1.. Phylogenetic tree of FGF and FGR molecules
(A) Phylogenetic tree of arthropod and vertebrate FGFs, including two FGFs from Parhyale (B) Phylogenetic tree of arthropod and vertebrate FGFRs, including a single FGFR in Parhyale. DOI: http://dx.doi.org/10.7554/eLife.20062.026
Figure 8—figure supplement 2.
Figure 8—figure supplement 2.. Phylogenetic tree of CERS homeobox family genes.
A phylogenetic tree highlighting an expansion of CERS homeobox family genes in Parhyale. DOI: http://dx.doi.org/10.7554/eLife.20062.027
Figure 9.
Figure 9.. Homeodomain protein family tree.
The overview of homeodomain radiation and phylogenetic relationships among homeodomain proteins from Arthropoda (P. hawaiensis, D. melanogaster and A. mellifera), Chordata (H. sapiens and B. floridae), and Cnidaria (N. vectensis). Six major homeodomain classes are illustrated (SINE, TALE, POU, LIM, ANTP and PRD) with histograms indicating the number of genes in each species belonging to a given class. DOI: http://dx.doi.org/10.7554/eLife.20062.028
Figure 10.
Figure 10.. Evidence for an intact Hox cluster in Parhyale.
(A–F’’) Double fluorescent in situ hybridizations (FISH) for nascent transcripts of genes. (A–A’’) Deformed (Dfd) and Sex combs reduced (Scr), (B-B’’) engrailed 1 (en1) and Ultrabithorax (Ubx), (C–C’’) en1 and abdominal-A (abd-A), (D–D’’) labial (lab) and Dfd, (E–E’’) Ubx and abd-A, and (F–F’’) Abdominal-B (Abd-B) and abd-A. Cell nuclei are stained with DAPI (blue) in panels A–F and outlined with white dotted lines in panels A'–F' and A''. Co-localization of nascent transcript dots in A, D, E and F suggest the proximity of the corresponding Hox genes in the genomic DNA. As negative controls, the en1 nascent transcripts in B and C do not co-localize with those of Hox genes Ubx or abd-A. (G) Schematic representation of the predicted configuration of the Hox cluster in Parhyale. Previously identified genomic linkages are indicated with solid black lines, whereas linkages established by FISH are shown with dotted gray lines. The arcs connecting the green and red dots represent the linkages identified in D, E and F, respectively. The position of the Hox3 gene is still uncertain. Scale bars are 5 µm. DOI: http://dx.doi.org/10.7554/eLife.20062.029
Figure 11.
Figure 11.. Lignocellulose digestion overview.
(A) Simplified drawing of lignocellulose structure. The main component of lignocellulose is cellulose, which is a-1,4-linked chain of glucose monosaccharides. Cellulose and lignin are organized in structures called microfibrils, which in turn form macrofibrils. (B) Summary of cellulolytic enzymes and reactions involved in the breakdown of cellulose into glucose. -1,4-endoclucanases of the GH9 family catalyze the hydrolysis of crystalline cellulose into cellulose chains. -1,4-exoclucanases of the GH7 family break down cellulose chains into cellobiose (glucose disaccharide) that can be converted to glucose by -glucosidases. (C) Adult Parhyale feeding on a slice of carrot. DOI: http://dx.doi.org/10.7554/eLife.20062.030
Figure 12.
Figure 12.. Phylogenetic analysis of GH7 and GH9 family proteins.
(A) Phylogenetic tree showing the relationship between GH7 family proteins of Parhyale, other crustaceans (Malacostraca, Branchiopoda, Copepoda), fungi and symbiotic protists (root). UniProt and GenBank accessions are listed next to the species names. (B) Phylogenetic tree showing the relationship between GH9 family proteins of Parhyale, crustaceans, insects, molluscs, echinoderms, amoeba, bacteria and plants (root). UniProt and GenBank accessions are listed next to the species names. Both trees were constructed with RAxML using the WAG+G model from multiple alignments of protein sequences created with MUSCLE. DOI: http://dx.doi.org/10.7554/eLife.20062.031
Figure 12—figure supplement 1.
Figure 12—figure supplement 1.. Alignment of GH7 family genes.
Alignment of GH7 family genes in Parhyale with those from Chelura terebans and Limnoria quadripunctata. DOI: http://dx.doi.org/10.7554/eLife.20062.033
Figure 13.
Figure 13.. Comparison of innate immunity genes.
(A) Phylogenetic tree of peptidoglycan recognition proteins (PGRPs). With the exception of Remipedes, PGRPs were not found in Crustaceans. PGRPs have been found in Arthropods, including insects, Myriapods and Chelicerates. (B) Phylogenetic tree of Toll-like receptors (TLRs) generated from five Crustaceans, three Hexapods, two Chelicerates, one Myriapod and one vertebrate species. (C) Genomic organization of the Parhyale Dscam locus showing the individual exons and exon arrays encoding the immunoglobulin (IG) and fibronectin (FN) domains of the protein. (D) Structure of the Parhyale Dscam locus and comparison with the (E) Dscam loci from Daphnia pulex, Daphnia magna and Drosophila melanogaster. The white boxes represent the number of predicted exons in each species encoding the signal peptide (red), the IGs (blue), the FNs and transmembrane (yellow) domains of the protein. The number of alternatively spliced exons in the arrays encoding the hypervariable regions IG2 (exon 4 in all species), IG3 (exon 6 in all species) and IG7 (exon 14 in Parhyale, 11 in D. pulex and 9 in Drosophila) are indicated under each species schematic in the purple, green and magenta boxes, respectively. Abbreviations of species used: Parhyale hawaiensis (Phaw), Bombyx mori (Bmor), Aedes aegypti (Aaeg), Drosophila melanogaster (Dmel), Apis mellifera (Amel), Speleonectes tulumensis (Stul), Strigamia maritima (Smar), Stegodyphus mimosarum (Smim), Ixodes scapularis (Isca), Amblyomma americanum (Aame), Nephila pilipes (Npil), Rhipicephalus microplus (Rmic), Ixodes ricinus (Iric), Amblyomma cajennense (Acaj), Anopheles gambiae (Agam), Daphnia pulex (Apul), Tribolium castaneum (Tcas), Litopenaeus vannamei (Lvan), Lepeophtheirus salmonis (Lsal), Eucyclops serrulatus (Eser), Homo sapiens (H.sap). Both trees were constructed with RAxML using the WAG+G model from multiple alignments of protein sequences created with MUSCLE. DOI: http://dx.doi.org/10.7554/eLife.20062.034
Figure 13—figure supplement 1.
Figure 13—figure supplement 1.. Overview of Parhyale Dscam structure and hypervariable regions
(A) Overview of domain structure of Parhyale Dscam protein and position of primers used to assess use of exons in 3 hypervariable regions. (B) Sequence alignments of cloned hypervariable regions in IG2 and (C) IG3 and (D) IG7. (E) Alignment of crustacean DsCam proteins. DOI: http://dx.doi.org/10.7554/eLife.20062.036
Figure 14.
Figure 14.. Evolution of miRNA families in Eumetazoans.
Phylogenetic tree showing the gains (in green) and losses (in red) of miRNA families at various taxonomic levels of the Eumetazoan tree leading to Parhyale. miRNAs marked with plain characters were identified by MirPara with small RNA sequencing read support. miRNAs marked with bold characters were identified by Rfam and MirPara with small RNA sequencing read support. DOI: http://dx.doi.org/10.7554/eLife.20062.038
Figure 14—figure supplement 1.
Figure 14—figure supplement 1.. Phylogenetic trees of Dicer and PIWI/AGO genes.
(A) Phylogenetic tree of Dicer family genes, including two Dicer genes from Parhyale. (B) Phylogenetic tree of PIWI/AGO genes, including several Parhyale genes. DOI: http://dx.doi.org/10.7554/eLife.20062.040
Figure 14—figure supplement 2.
Figure 14—figure supplement 2.. Examples of miRNAs in the Parhyale genome.
(A) Parhyale mir-100 and let-7 and clustered together in the intron of a putative lncRNA (B) A Parhyale mir-71/mir-2 family cluster (C) Parhyale mir-10 is in a conserved position in the genome between the Dfd and Scr Hox genes (D) Alignment of the predicted mir-10 precursor with mir-10 precursors from other species. DOI: http://dx.doi.org/10.7554/eLife.20062.041
Figure 15.
Figure 15.. Analysis of Parhyale genome methylation.
(A) Phylogenetic tree showing the families and numbers of DNA methyltransferases (DNMTs) present in the genomes of indicated species. Parhyale has one copy from each DNMT family. (B) Amounts of methylation detected in the Parhyale genome. Amount of methylation is presented as percentage of reads showing methylation in bisulfite sequencing data. DNA methylation was analyzed in all sequence contexts (CG shown in dark, CHG in blue and CHH in red) and was detected preferentially in CpG sites. (C) Histograms showing mean percentages of methylation in different fractions of the genome: DNA transposons (DNA), long terminal repeat transposable elements (LTR), rolling circle transposable elements (RC), long interspersed elements (LINE), coding sequences (cds), introns, promoters, and the rest of the genome. DOI: http://dx.doi.org/10.7554/eLife.20062.042
Figure 16.
Figure 16.. CRISPR/Cas9-based genome editing in Parhyale.
(A) Wild-type morphology. (B) Mutant Parhyale with truncated limbs after CRISPR-mediated knock-out (DllKO) of the limb patterning gene Distal-less (PhDll-e). Panels show ventral views of juveniles stained for cuticle and color-coded by depth with anterior to the left. (C) Fluorescent tagging of PhDll-e expressed in most limbs (shown in cyan) by CRISPR-mediated knock-in (DllKI) using the non-homologous-end-joining repair mechanism. Panel shows a lateral view with anterior to the left and dorsal to the top of a live embryo (stage S22) with merged bright-field and fluorescence channels. Yolk autofluorescence produces a dorsal crescent of fluorescence in the gut. Scale bars are 100 μm. DOI: http://dx.doi.org/10.7554/eLife.20062.044
Figure 16—figure supplement 1.
Figure 16—figure supplement 1.. CRISPR experiments targeting the Distalless locus.
CRSIPR/Cas-based targeted genome editing in Parhyale. (A) Summary of gene knock-out experiments. (B) Illustration of the targeted PhDll-e (Dll) cDNA showing the 5’ and 3’ untranslated regions (UTRs), the coding sequence with the homeodomain (black box) and the positions targeted by the two sgRNAs Dll1 and Dll2. (C) Genotyping of a mosaic mutant embryo (F0 generation) with truncated appendages that was injected with Cas9 protein and the Dll1 sgRNA (Dll1+PAM sequence in red). This animal carried multiple Dll alleles with deletions (in yellow) or insertions (in cyan) in the region targeted by Dll1 downstream of the start codon (in green). Most of these alleles likely encoded truncated non-functional proteins, while a few alleles likely encoded functional proteins missing a few aminoacids at the targeted region (putative number of aminoacids shown on the right). (D) Genotyping of wild-type and mutant embryos (F1 generation) from two separate crosses (top and bottom black boxes) of F0 animals injected with Cas9 protein and the Dll2 sgRNA (Dll2+PAM sequence in red). Each mutant F1 carried two non-functional Dll alleles encoding truncated proteins, while their wild-type siblings carried one functional allele and one non-functional allele (putative number of aminoacids shown on the right). (E) Summary of targeted gene knock-in based on the non-homologous end joining repair mechanism. (F) Schematic representation of the endogenous Dll locus with the non coding sequences shown in blue and the coding sequences in cyan (left), and of the tagging plasmid carrying a copy of the Dll coding sequence (in green), the T2A self-cleaving peptide (in purple), a fusion of the Parhyale histone H2B with the Ruby 2 monomeric red fluorescent protein (in magenta) and the Dll 3’UTR (in dark green). The Dll2+PAM sequences (underlined) and flanking sequences in the Dll locus and plasmid are shown in cyan and green, respectively. A single nucleotide substitution (A>T shown in magenta) right after the PAM sequence was introduced on purpose in the plasmid to discriminate the tagged sequence from the original one. The left and right junctions between the endogenous and inserted sequences were recovered by PCR from transgenic animals with fluorescent limbs using the indicated pairs of primers (magenta and green, respectively). The tagged Dll locus is likely encoding a functional Dll protein (with a small 7-aminoacid deletion in the region targeted by Dll2 and a stretch of T2A aminoacids in its C-terminus) and a nuclear fluorescent reporter (with the remaining T2A aminoacids in its N-terminus). DOI: http://dx.doi.org/10.7554/eLife.20062.045

Similar articles

See all similar articles

Cited by 32 PubMed Central articles

See all "Cited by" articles

References

    1. Akam M. Arthropods: developmental diversity within a (super) phylum. PNAS. 2000;97:1–4. doi: 10.1073/pnas.97.9.4438. - DOI - PMC - PubMed
    1. Alwes F, Enjolras C, Averof M. Live imaging reveals the progenitors and cell dynamics of limb regeneration. eLife. 2016;5:e19766 doi: 10.7554/eLife.19766. - DOI - PMC - PubMed
    1. Alwes F, Hinchen B, Extavour CG. Patterns of cell lineage, movement, and migration from germ layer specification to gastrulation in the amphipod crustacean Parhyale hawaiensis. Developmental Biology. 2011;359:110–123. doi: 10.1016/j.ydbio.2011.07.029. - DOI - PubMed
    1. Aravin AA, Naumova NM, Tulin AV, Vagin VV, Rozovsky YM, Gvozdev VA. Double-stranded RNA-mediated silencing of genomic tandem repeats and transposable elements in the D. melanogaster germline. Current Biology. 2001;11:1017–1027. doi: 10.1016/S0960-9822(01)00299-8. - DOI - PubMed
    1. Arbouzova NI, Zeidler MP. JAK/STAT signalling in Drosophila: insights into conserved regulatory and cellular functions. Development. 2006;133:2605–2616. doi: 10.1242/dev.02411. - DOI - PubMed

Publication types

MeSH terms

Feedback