Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 435 (7038), 43-57

The Genome of the Social Amoeba Dictyostelium Discoideum


The Genome of the Social Amoeba Dictyostelium Discoideum

L Eichinger et al. Nature.


The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

Conflict of interest statement

Competing interests statement

The authors declare that they have no competing financial interests.


Figure 1
Figure 1. Chromosomal assemblies compared against HAPPY map data
The locations of markers as found in the sequence (vertical axis) are plotted against their location in HAPPY maps (horizontal axis) for chromosomes 1–6. Markers mapped to one chromosome but found in the assembled sequence of another are indicated by diamonds on the horizontal axis. The dashed box indicates a large inverted duplication on Chr2: markers in this region are shown at one of their two possible map locations but are found at two points in the sequence.
Figure 2
Figure 2. (pullout section) The genome of Dictyostelium discoideum
On each of the six chromosomal assemblies (left) the diameter of the tube represents coding density (proportion of coding bases summed over both strands; centre-weighted sliding window of 100kb; scale on right); coloured bands on the chromosomes represent tRNAs (red), complex repeats (blue), gaps (black) and ribosomal DNA sequences (yellow). G+C content is plotted above each chromosome (centre-weighted sliding window of 100kb; scale on left). The locations of HAPPY markers are indicated by short green ticks immediately below the distance scale. Immediately beneath each chromosome are indicated (short vertical ticks) the locations of genes known to be up-regulated (red), down-regulated (blue) or whose level of expression does not change significantly (grey) in the transition from solitary to aggregative existence (expression data from reference 91); heavy coloured bars below this indicate significant clusters of genes which are preferentially expressed in germinating spores (red), in dedifferentiating cells (green), in prespore cells (blue) or in prestalk cells (yellow). The translucent hourglass on chromosome 2 is centred on a large inverted duplication. The translucent cylinder on chromosome 3 indicates a typical 300kb region which is shown in expanded form in inset panel A (above) to illustrate the clustering of identical tRNA genes (red arrows indicate polarity of tRNA genes); a 50kb section of this region is expanded further in inset panel B, revealing the close association of TRE elements (specific family named above) with tRNAs. The translucent yellow disc on chromosome 4 indicates the location of the presumed chromosomal master-copy of the rDNA element. In inset panel C (below), the structure of the palindromic extrachromosomal element is shown schematically (i ; magenta bands = rDNA genes, green bands = G+C-rich regions, red end-caps = short repetitive telomere structures; the translucent hoop indicates the central region of asymmetry. (ii) two chromosomal sequence contigs, each carrying an rDNA-like sequence (green or yellow; dotted lines indicate corresponding part of element) flanked by complex repeats (blue). From these contigs, we infer the probable structure (iii) of the genomic master copy (grey=flanking sequence on chromosome 4). This structure suggests a mechanism for regenerating the extrachromosomal copies by transcription of a single strand (iv), hairpin formation and strand extension (v; broken line indicates synthesis of complementary strand), unfolding of the hairpin and synthesis of a fully complementary strand (vi; broken line indicates synthesis of second strand; telomeric caps added post-synthetically).
Figure 3
Figure 3. DIRS repeat region of chromosome 1
Complete complex repeat units are represented by coloured triangles whose size corresponds to the sequence length of the repeat unit (key, upper); bottom-left and top-right corners of triangle represent 5’ and 3’ ends of repeat, respectively. The arrangement of complete and partial repeat units within the first 187kb of D. discoideum chromosome 1 is shown (lower) by corresponding portions of the triangles; the orientation of the triangles indicates the direction in which each repeat unit lies. Vertical scale (sizes of repeat units) is the same as the horizontal scale (chromosomal distances).
Figure 4
Figure 4. Phylogeny of gene family members compared to their physical order
The optimally parsimonious phylogenetic tree of 11 Acetyl-CoA synthase genes , computed using the PHYLIP module 'Protpars' (, is shown to the left; dictyBase ID numbers shown at the end of each branch. The graph (right) indicates the arrangement of these genes on chromosome 2 (solid black boxes; gaps indicate introns, pointed heads indicate direction of transcription; chromosomal distance scale at bottom; other unrelated genes in the same region indicated in grey above the X-axis). The correspondence between phylogeny and physical order implies that the cluster has arisen by a series of segmental tandem duplications and local inversions in parallel with sequence divergence.
Figure 5
Figure 5. Proteome-based eukaryotic phylogeny
The phylogenetic tree was reconstructed from a database of 5,279 orthologous protein clusters drawn from the proteomes of the 17 eukaryotes shown, and was rooted on 159 protein clusters that had representatives from six archaebacterial proteomes. Tree construction, the database of protein clusters and a model of protein divergence used for maximum likelihood estimation are described in Supplementary Information. The relative lengths of the branches are given Darwins, (1 Darwin= 1/2000 of the divergence between S. cerevisiae and humans). Species that are not specified are Plasmodium falciparum (Malaria Parasite), Chlamydomonas reinhardtii (Green Alga), Oryza sativa (Rice), Zea mays (Maize), Fugu rubripes (Fish), and Anopheles gambiae (Mosquito).
Figure 6
Figure 6. Distribution of PFAM domains amongst eukaryotes
The number of eukaryote-specific Pfam domains present in each group of eukaryotic organisms is shown. The boxed numbers are the domains that are present in Dictyostelium and the other numbers are those domains that are absent from Dictyostelium. The animals are H. sapiens, F. rubripes, C. elegans, D. melanogaster; the fungi are, N. crassa, A. nidulans, S. pombe and S. cerevisiae and the plants are, A. thaliana, O. sativa and C. reinhardtii. A complete listing of the domains can be found in the Supplementary Information.
Figure 7
Figure 7. Microfilament system proteins
Proteins with probable interactions with the actin cytoskeleton are tabulated by their documented or predicted functions. Coloured boxes indicate the presence of a protein related to the Dictyostelium (D) protein in metazoa (M), fungi (F) or plants (P). Dictyostelium-specific proteins have no recognizable relatives or differ from relatives due to extensions or unusual domain compositions. For details see Supplementary Information. Actin-binding modules: ADF, actin depolymerisation factor/cofilin-like domain; CH, calponin homology domain; EVH, Ena/VASP homology domain 2; FH2, formin homology 2 domain; GEL, gelsolin repeat domain; TRE, trefoil domain; KELCH, Kelch repeat domain; MYO, myosin motor domain; TAL, the I/LWEQ, actin-binding domain of talin and related proteins; VHP, villin head piece; WH2, Wiskott Aldrich syndrome homology region 2.
Figure 8
Figure 8. The G-protein coupled receptors
A CLUSTALX alignment of the sequences encompassing the seven transmembrane domains of all Dictyostelium GPCRs, and selected GPCRs from other organisms, was used to create an unrooted dendrogram with the TreeView program. A black circle marks the innermost node of each branch supported by >60% bootstraps. # indicates that this gene model has to be split, and the asterisk indicates a putative pseudogene. dictyBase identifiers (DDB…) were used for the newly discovered Dictyostelium receptors and SwissProt identifiers for all other receptors. CAR/CRL: cAMP receptor/cAMP receptor-like. A. thaliana, P.p.: Polysphondylium pallidum, C.e.: C. elegans, D.m.: D. melanogaster; B.t: Bos taurus; X.l.: Xenopus laevis; G.c.: Geodia cydonium.
Figure 9
Figure 9. Putative adhesion/signalling proteins
Proteins containing repeated EGF/laminin and/or E-set SCOP Superfamily domains are classified into groups containing mannose-6-phosphate receptor, mainly EGF/laminin, mainly E-set, or combinations of domains. Most of these proteins have predicted transmembrane domains and so are expected to be cell surface proteins. ComC, LagC, and LagD are proteins that have been characterized to have adhesion and/or signalling functions during multicellular development–. Other domain abbreviations: M-6-P R, mannose-6-phosphate receptor; GFR, growth factor receptor; RNI, RNI-like; Fn 3, fibronectin type III; C2, Calcium-dependent lipid binding; LDL, L domain-like leucine-rich repeat.

Similar articles

See all similar articles

Cited by 467 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources