Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 33 (12), 3875-96

Origin and Evolution of the Archaeo-Eukaryotic Primase Superfamily and Related Palm-Domain Proteins: Structural Insights and New Members

Affiliations

Origin and Evolution of the Archaeo-Eukaryotic Primase Superfamily and Related Palm-Domain Proteins: Structural Insights and New Members

Lakshminarayan M Iyer et al. Nucleic Acids Res.

Abstract

We report an in-depth computational study of the protein sequences and structures of the superfamily of archaeo-eukaryotic primases (AEPs). This analysis greatly expands the range of diversity of the AEPs and reveals the unique active site shared by all members of this superfamily. In particular, it is shown that eukaryotic nucleo-cytoplasmic large DNA viruses, including poxviruses, asfarviruses, iridoviruses, phycodnaviruses and the mimivirus, encode AEPs of a distinct family, which also includes the herpesvirus primases whose relationship to AEPs has not been recognized previously. Many eukaryotic genomes, including chordates and plants, encode previously uncharacterized homologs of these predicted viral primases, which might be involved in novel DNA repair pathways. At a deeper level of evolutionary connections, structural comparisons indicate that AEPs, the nucleases involved in the initiation of rolling circle replication in plasmids and viruses, and origin-binding domains of papilloma and polyoma viruses evolved from a common ancestral protein that might have been involved in a protein-priming mechanism of initiation of DNA replication. Contextual analysis of multidomain protein architectures and gene neighborhoods in prokaryotes and viruses reveals remarkable parallels between AEPs and the unrelated DnaG-type primases, in particular, tight associations with the same repertoire of helicases. These observations point to a functional equivalence of the two classes of primases, which seem to have repeatedly displaced each other in various extrachromosomal replicons.

Figures

Figure 1
Figure 1
Topology diagrams and structures of AEP-type primases and related proteins. Strands are shown as arrows with the arrowhead on the C-terminal side and numbered 1 through 4 or 6, respectively, according to the conventions used in the text. Helices are shown as green rectangles, non-conserved elements in faint gray. Purple arrowheads mark the protein's C-terminus. The location of catalytically important residues is indicated by colored circles (green—histidine, red—acidic, the yellow circle represents a tyrosine residue that becomes covalently attached to the 5′ phosphate of a cleaved DNA strand in RCRE). The topology diagram at the top of the figure is an idealization and is not derived from an actual structure. It shows the positions of the Zn-clusters found in various members of the AEP superfamily as discussed in the text. The N-terminal (αβ)2 units of primase and primpol (in gray box) pack against the beta sheet of the palm fold but they are drawn here as a ‘slide-out’ for clarity of presentation. The structures of selected AEP-type primases are shown in the right hand panel. They are in the same orientation as the topology diagrams with the flange strand running above the plane of the beta sheet. The bottom panel shows topology diagrams of palm domain proteins from outside the AEP primase group for comparison purposes. The structural and topology diagrams were derived from the following PDB IDs: 1V33, 1G71 (8), 1RNI (25), 1M55 (59), 1F08 (54), 1QUV (92), 1MML (93), 1TGO (94), 1TAQ (95) and 1FX2 (96).
Figure 2
Figure 2
Multiple sequence alignment of the AEP superfamily. Proteins are designated by their gene names, species abbreviations and GenBank IDs separated by underscores. Columns of amino acids are colored based on their side chain properties and conservation in the alignment; 70% conservation was used to calculate the consensus. Poorly conserved, large inserts are replaced by the corresponding number of residues. The secondary structure shown above the alignment was derived from the crystal structures of the archaeal primase (PDB ID: 1g71) and the primpol protein (PDB ID: 1rni). Strands and helices are denoted above the alignment by E and H, respectively. The coloring scheme and consensus abbreviations are as follows: h, hydrophobic residues (ACFILMVWY), shaded yellow; b, big residues (LIYERFQKMW), shaded gray; s, small residues (AGSVCDN) and u, tiny residues (GAS), colored green; p, polar residues (STEDKRNQHC); +, basic residues (HRK) and -, acidic residues (DE), colored magenta. Species abbreviations are as follows: AMV: Amsacta moorei entomopoxvirus; APMV: Acanthamoeba polyphaga mimivirus; ASFV: African swine fever virus; Aamb: Acidianus ambivalens; AcNPV: Autographa californica nucleopolyhedrovirus; Aful: Archaeoglobus fulgidus; Amel: Apis mellifera; Ana: Nostoc sp.; Aper: Aeropyrum pernix; AsGV: Agrotis segetum granulovirus; Atha: Arabidopsis thaliana; Avar: Anabaena variabilis; BHV4: Bovine herpesvirus 4; BP315.5: Streptococcus pyogenes phage 315.5; BPA2: Lactobacillus casei bacteriophage A2; BPAPSE-1: Acyrthosiphon pisum bacteriophage APSE-1; BPAT3: Bacteriophage phi AT3; BPBCJA1c: Bacillus clarkii bacteriophage BCJA1c; BPBIP-1: Bordetella phage BIP-1; BPBcep1: Burkholderia cenocepacia phage Bcep1; BPBcepC6B: Burkholderia cepacia complex phage BcepC6B; BPBcepNazgul: Burkholderia cepacia phage BcepNazgul; BPN15: Bacteriophage N15; BPP4: Bacteriophage P4; BPSA: Bacteriophage PSA; BPSFi18: Streptococcus thermophilus bacteriophage SFi18; BPSfi11: Streptococcus thermophilus bacteriophage Sfi11; BPSfi21: Streptococcus thermophilus bacteriophage Sfi21; BPTM4: Mycobacteriophage TM4; BPVP16T: Vibrio parahaemolyticus phage VP16T; BPVP2: Vibriophage VP2; BPadh: Lactobacillus bacteriophage phi adh; BPmi7-9: Lactococcus phage mi7-9; BPphi-BT1: Bacteriophage phi-BT1; BPphi-R73: Bacteriophage phi-R73; BPphi105: Bacteriophage phi-105; BPphi31: Lactococcus bacteriophage phi31; BPphiHSIC: Listonella pelagia phage phiHSIC; BPphig1e: Bacteriophage phig1e; Bbac: Bdellovibrio bacteriovorus; Bbro: Bordetella bronchiseptica; Bcep: Burkholderia cepacia; Bcer: Bacillus cereus; Bfra: Bacteroides fragilis; Bjap: Bradyrhizobium japonicum; Blic: Bacillus licheniformis; Blin: Brevibacterium linens; Bpse: Burkholderia pseudomallei; Bthe: Bacteroides thetaiotaomicron; Bthu: Bacillus thuringiensis; CIV: Chilo iridescent virus; CaHV: Callitrichine herpesvirus 3; Ccol: Campylobacter coli; CeHV: Cercopithecine herpesvirus 9; Cele: Caenorhabditis elegans; Cfum: Choristoneura fumiferana; Cglu: Corynebacterium glutamicum; Cpar: Cryptosporidium parvum; Cthe: Clostridium thermocellum; Cwat: Crocosphaera watsonii; Ddes: Desulfovibrio desulfuricans; Ddis: Dictyostelium discoideum; Dmel: Drosophila melanogaster; Drad: Deinococcus radiodurans; Dvul: Desulfovibrio vulgaris; EHV1: Equid herpesvirus 1; ESV: Ectocarpus siliculosus virus; Ecoli: Escherichia coli; Ecun: Encephalitozoon cuniculi; Efae: Enterococcus faecalis; Efae: Enterococcus faecium; Ehis: Entamoeba histolytica; FPV: Fowlpox virus; FV3: Frog virus 3; Faci Ferroplasma acidarmanus; FirV: Feldmannia irregularis virus a; GHV2: Gallid herpesvirus 2; Ggal: Gallus gallus; Gkau: Geobacillus kaustophilus; Glam: Giardia lamblia; HHV2: Human herpesvirus 2; HHV3: Human herpesvirus 3; HHV4: Human herpesvirus 4; HHV5: Human herpesvirus 5; HHV6: Human herpesvirus 6B; Hinf: Haemophilus influenzae; Hpyl: Helicobacter pylori; Hsal: Halobacterium salinarum; Hsap: Homo sapiens; IHV1: Ictalurid herpesvirus 1; IsknV: Infectious spleen and kidney necrosis virus; LdNPV: Lymantria dispar nucleopolyhedrovirus; LdV1: Lymphocystis disease virus 1; Ldel: Lactobacillus delbrueckii; Linn: Listeria innocua; Llac: Lactococcus lactis; Lmaj: Leishmania major; Lmon: Listeria monocytogenes; Lpla: Lactobacillus plantarum; MCV: Molluscum contagiosum virus subtype 1; Masp: Magnetococcus sp.; Mbur: Methanococcoides burtonii; McNPV: Mamestra configurata nucleopolyhedrovirus B; Mcap: Methylococcus capsulatus; Mjan: Methanocaldococcus jannaschii; Mkan: Methanopyrus kandleri; Mmag: Magnetospirillum magnetotacticum; Mmaz: Methanosarcina mazei; Mmus: Mus musculus; MsEV: Melanoplus sanguinipes entomopoxvirus; Msp.: Micrococcus sp.; Mtub: Mycobacterium tuberculosis; Ncra: Neurospora crassa; Nequ: Nanoarchaeum equitans; NsNPV: Neodiprion sertifer nucleopolyhedrovirus; OHV1: Ostreid herpesvirus 1; OsNPV: Orgyia pseudotsugata multicapsid nucleopolyhedrovirus; Osat: Oryza sativa; PBCV: Paramecium bursaria Chlorella virus 1; PHV1: Psittacid herpesvirus 1; Pfal: Plasmodium falciparum; Phor: Pyrococcus horikoshii; Psav: Pseudomonas savastanoi; Psp.: Polaromonas sp.; Pyae: Pyrobaculum aerophilum; Rbal: Rhodopirellula baltica; Rnor: Rattus norvegicus; Rsol: Ralstonia solanacearum; Rsp.: Rhodococcus sp.; Rsph Rhodobacter sphaeroides; Saur: Staphylococcus aureus; Save: Streptomyces avermitilis; Scer: Saccharomyces cerevisiae; Scoe: Streptomyces coelicolor; SeNPV: Spodoptera exigua nucleopolyhedrovirus; Sglo: Streptomyces globisporus; Sisl: Sulfolobus islandicus; SlNPV: Spodoptera litura nucleopolyhedrovirus; Spom: Schizosaccharomyces pombe; Spyo: Streptococcus pyogenes; Ssp: Synechocystis sp.; Ssui: Streptococcus suis; Syn: Synechococcus sp.; Tbru: Trypanosoma brucei; Tcru: Trypanosoma cruzi; Telo: Thermosynechococcus elongatus; Tint: Thiobacillus intermedius; Tnig: Tetraodon nigroviridis; Tsp.: Thiobacillus sp.; Tthe: Thermus thermophilus; Tvol: Thermoplasma volcanium; VV: Vaccinia virus; Vcho: Vibrio cholerae; Vvul: Vibrio vulnificus; Xcam: Xanthomonas campestris; Xfas: Xylella fastidiosa; XnGV: Xestia c-nigrum granulovirus.
Figure 2
Figure 2
Multiple sequence alignment of the AEP superfamily. Proteins are designated by their gene names, species abbreviations and GenBank IDs separated by underscores. Columns of amino acids are colored based on their side chain properties and conservation in the alignment; 70% conservation was used to calculate the consensus. Poorly conserved, large inserts are replaced by the corresponding number of residues. The secondary structure shown above the alignment was derived from the crystal structures of the archaeal primase (PDB ID: 1g71) and the primpol protein (PDB ID: 1rni). Strands and helices are denoted above the alignment by E and H, respectively. The coloring scheme and consensus abbreviations are as follows: h, hydrophobic residues (ACFILMVWY), shaded yellow; b, big residues (LIYERFQKMW), shaded gray; s, small residues (AGSVCDN) and u, tiny residues (GAS), colored green; p, polar residues (STEDKRNQHC); +, basic residues (HRK) and -, acidic residues (DE), colored magenta. Species abbreviations are as follows: AMV: Amsacta moorei entomopoxvirus; APMV: Acanthamoeba polyphaga mimivirus; ASFV: African swine fever virus; Aamb: Acidianus ambivalens; AcNPV: Autographa californica nucleopolyhedrovirus; Aful: Archaeoglobus fulgidus; Amel: Apis mellifera; Ana: Nostoc sp.; Aper: Aeropyrum pernix; AsGV: Agrotis segetum granulovirus; Atha: Arabidopsis thaliana; Avar: Anabaena variabilis; BHV4: Bovine herpesvirus 4; BP315.5: Streptococcus pyogenes phage 315.5; BPA2: Lactobacillus casei bacteriophage A2; BPAPSE-1: Acyrthosiphon pisum bacteriophage APSE-1; BPAT3: Bacteriophage phi AT3; BPBCJA1c: Bacillus clarkii bacteriophage BCJA1c; BPBIP-1: Bordetella phage BIP-1; BPBcep1: Burkholderia cenocepacia phage Bcep1; BPBcepC6B: Burkholderia cepacia complex phage BcepC6B; BPBcepNazgul: Burkholderia cepacia phage BcepNazgul; BPN15: Bacteriophage N15; BPP4: Bacteriophage P4; BPSA: Bacteriophage PSA; BPSFi18: Streptococcus thermophilus bacteriophage SFi18; BPSfi11: Streptococcus thermophilus bacteriophage Sfi11; BPSfi21: Streptococcus thermophilus bacteriophage Sfi21; BPTM4: Mycobacteriophage TM4; BPVP16T: Vibrio parahaemolyticus phage VP16T; BPVP2: Vibriophage VP2; BPadh: Lactobacillus bacteriophage phi adh; BPmi7-9: Lactococcus phage mi7-9; BPphi-BT1: Bacteriophage phi-BT1; BPphi-R73: Bacteriophage phi-R73; BPphi105: Bacteriophage phi-105; BPphi31: Lactococcus bacteriophage phi31; BPphiHSIC: Listonella pelagia phage phiHSIC; BPphig1e: Bacteriophage phig1e; Bbac: Bdellovibrio bacteriovorus; Bbro: Bordetella bronchiseptica; Bcep: Burkholderia cepacia; Bcer: Bacillus cereus; Bfra: Bacteroides fragilis; Bjap: Bradyrhizobium japonicum; Blic: Bacillus licheniformis; Blin: Brevibacterium linens; Bpse: Burkholderia pseudomallei; Bthe: Bacteroides thetaiotaomicron; Bthu: Bacillus thuringiensis; CIV: Chilo iridescent virus; CaHV: Callitrichine herpesvirus 3; Ccol: Campylobacter coli; CeHV: Cercopithecine herpesvirus 9; Cele: Caenorhabditis elegans; Cfum: Choristoneura fumiferana; Cglu: Corynebacterium glutamicum; Cpar: Cryptosporidium parvum; Cthe: Clostridium thermocellum; Cwat: Crocosphaera watsonii; Ddes: Desulfovibrio desulfuricans; Ddis: Dictyostelium discoideum; Dmel: Drosophila melanogaster; Drad: Deinococcus radiodurans; Dvul: Desulfovibrio vulgaris; EHV1: Equid herpesvirus 1; ESV: Ectocarpus siliculosus virus; Ecoli: Escherichia coli; Ecun: Encephalitozoon cuniculi; Efae: Enterococcus faecalis; Efae: Enterococcus faecium; Ehis: Entamoeba histolytica; FPV: Fowlpox virus; FV3: Frog virus 3; Faci Ferroplasma acidarmanus; FirV: Feldmannia irregularis virus a; GHV2: Gallid herpesvirus 2; Ggal: Gallus gallus; Gkau: Geobacillus kaustophilus; Glam: Giardia lamblia; HHV2: Human herpesvirus 2; HHV3: Human herpesvirus 3; HHV4: Human herpesvirus 4; HHV5: Human herpesvirus 5; HHV6: Human herpesvirus 6B; Hinf: Haemophilus influenzae; Hpyl: Helicobacter pylori; Hsal: Halobacterium salinarum; Hsap: Homo sapiens; IHV1: Ictalurid herpesvirus 1; IsknV: Infectious spleen and kidney necrosis virus; LdNPV: Lymantria dispar nucleopolyhedrovirus; LdV1: Lymphocystis disease virus 1; Ldel: Lactobacillus delbrueckii; Linn: Listeria innocua; Llac: Lactococcus lactis; Lmaj: Leishmania major; Lmon: Listeria monocytogenes; Lpla: Lactobacillus plantarum; MCV: Molluscum contagiosum virus subtype 1; Masp: Magnetococcus sp.; Mbur: Methanococcoides burtonii; McNPV: Mamestra configurata nucleopolyhedrovirus B; Mcap: Methylococcus capsulatus; Mjan: Methanocaldococcus jannaschii; Mkan: Methanopyrus kandleri; Mmag: Magnetospirillum magnetotacticum; Mmaz: Methanosarcina mazei; Mmus: Mus musculus; MsEV: Melanoplus sanguinipes entomopoxvirus; Msp.: Micrococcus sp.; Mtub: Mycobacterium tuberculosis; Ncra: Neurospora crassa; Nequ: Nanoarchaeum equitans; NsNPV: Neodiprion sertifer nucleopolyhedrovirus; OHV1: Ostreid herpesvirus 1; OsNPV: Orgyia pseudotsugata multicapsid nucleopolyhedrovirus; Osat: Oryza sativa; PBCV: Paramecium bursaria Chlorella virus 1; PHV1: Psittacid herpesvirus 1; Pfal: Plasmodium falciparum; Phor: Pyrococcus horikoshii; Psav: Pseudomonas savastanoi; Psp.: Polaromonas sp.; Pyae: Pyrobaculum aerophilum; Rbal: Rhodopirellula baltica; Rnor: Rattus norvegicus; Rsol: Ralstonia solanacearum; Rsp.: Rhodococcus sp.; Rsph Rhodobacter sphaeroides; Saur: Staphylococcus aureus; Save: Streptomyces avermitilis; Scer: Saccharomyces cerevisiae; Scoe: Streptomyces coelicolor; SeNPV: Spodoptera exigua nucleopolyhedrovirus; Sglo: Streptomyces globisporus; Sisl: Sulfolobus islandicus; SlNPV: Spodoptera litura nucleopolyhedrovirus; Spom: Schizosaccharomyces pombe; Spyo: Streptococcus pyogenes; Ssp: Synechocystis sp.; Ssui: Streptococcus suis; Syn: Synechococcus sp.; Tbru: Trypanosoma brucei; Tcru: Trypanosoma cruzi; Telo: Thermosynechococcus elongatus; Tint: Thiobacillus intermedius; Tnig: Tetraodon nigroviridis; Tsp.: Thiobacillus sp.; Tthe: Thermus thermophilus; Tvol: Thermoplasma volcanium; VV: Vaccinia virus; Vcho: Vibrio cholerae; Vvul: Vibrio vulnificus; Xcam: Xanthomonas campestris; Xfas: Xylella fastidiosa; XnGV: Xestia c-nigrum granulovirus.
Figure 3
Figure 3
Inferred evolutionary history of the AEP superfamily. The overall topology of the phylogram was derived using synapomorphies and clustering based on DALI Z-scores. Synapomorphies that unify a set of lineages are indicated next to the filled yellow circles. The ellipses indicate large assemblages within which individual lineages show a generic relationship. Broken lines indicate an uncertainty with respect to the exact point of origin of a lineage. Archaeal and eukaryotic (including viral) branches are colored blue, bacterial branches are colored green, branches that include predominantly proteins from plasmids, phages and mobile elements are colored red. Ancestral branches and branches outside the AEP superfamily are in black. The phyletic distribution is shown in brackets: B, Bacteria; A, Archaea; E, Eukaryotes; V, Viruses; > represents a proposed lateral transfer.
Figure 4
Figure 4
Ordered graph of domain architectures and genome contexts. Each vertex represents a domain and the edges represent a contextual association. Domain combinations are shown as black arrows, with the arrow pointing from the N-terminus to the C-terminus of the multi-domain protein. Gene neighborhood associations are shown as red arrows with the arrows pointing in the 5′–3′ direction of the coding sequence. The blue lines with boxed ends represent experimentally observed functional associations. Domain architectures and gene neighborhood organizations are shown around the ordered graph. Where possible, these are organized into clade- or family-specific groups enclosed in an orange box. Proteins or genes that are depicted as domain architectures or operon clusters are denoted by the standard notation as in Figure 2. The species abbreviations are as in Figure 2. Genes with conserved neighborhoods are shown as boxed arrows with the arrow pointing in the 5′–3′ direction of the coding sequence. The C-terminal tail motif of the SFI-ORF24 proteins is represented by an orange extension in the domain representation.
Figure 5
Figure 5
Multiple alignment of the zinc ribbon-like domain located C-terminal to the AEP domain in poxvirus and herpesvirus primases and the Eukprim2 family. The coloring scheme, consensus abbreviations, secondary structure representations and species abbreviations are as in Figure 2. The residues predicted to be involved in metal binding are shaded red. Poorly conserved short inserts seen in some sequences are shown with a reduced font size.
Figure 6
Figure 6
Multiple alignment of the D5N domain. The coloring scheme, consensus abbreviations, secondary structure representations and species abbreviations are as in Figure 2.
Figure 7
Figure 7
Multiple alignment of the (A)PriCT-1 and PriCT-2 (B) domains. The coloring scheme, consensus abbreviations, secondary structure representations and species abbreviations are as in Figure 2. Furthermore, alcohol side chain containing residues (ST) are colored blue and denoted by an ‘o’ and aliphatic residues (LIV) are shaded yellow. Equivalent helices in PriCT-1 and PriCT-2 have been aligned with each other. Poorly conserved short inserts seen in some sequences are shown with a reduced font size.
Figure 8
Figure 8
Multiple alignment of the Primase Large subunit. The coloring scheme, consensus abbreviations and secondary structure representation are as in Figure 2. Short inserts are shown with a reduced font size, whereas longer inserts are represented as numbers. The secondary structure was predicted using the JPred program. The ‘a’ in the consensus abbreviations represents aromatic residues (FWY) that are shaded yellow. The cysteine residues predicted to have a role in metal-binding are shaded red. The granuloviruses appear to have lost their C-terminal cysteine cluster. Species abbreviations are as in Figure 2.
Figure 9
Figure 9
Multiple alignment of the putative RAD52-like domains encoded in the same predicted operons with prim-pols. The coloring scheme, consensus abbreviations, secondary structure representation and species abbreviations are as in Figure 2. Short inserts are shown with a reduced font size, whereas longer inserts are represented by the corresponding numbers of residues.

Similar articles

See all similar articles

Cited by 115 PubMed Central articles

See all "Cited by" articles

References

    1. Lodish H., Berk A., Zipursky S.L., Matsudaira P., Baltimore D., Darnell J.E. Molecular Cell Biology. NY: W.H. Freeman & Co.; 1999.
    1. Kornberg A., Baker T.A. DNA Replication, 2nd edn. NY: W.H. Freeman & Company; 1991.
    1. Salas M. Protein-priming of DNA replication. Annu. Rev. Biochem. 1991;60:39–71. - PubMed
    1. Noirot-Gros M.F., Ehrlich S.D. Change of a catalytic reaction carried out by a DNA replication protein. Science. 1996;274:777–780. - PubMed
    1. Ilyina T.V., Koonin E.V. Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res. 1992;20:3279–3285. - PMC - PubMed

Publication types

MeSH terms

Feedback