Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 1;11(2):531-545.
doi: 10.1093/gbe/evz008.

The Evolutionary Traceability of a Protein

Affiliations

The Evolutionary Traceability of a Protein

Arpit Jain et al. Genome Biol Evol. .

Abstract

Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. However, the similarity between orthologs decays with time, and ultimately it becomes insufficient to infer common ancestry. This leaves ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the "evolutionary traceability" as a measure that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. Using yeast, we show that genes that were thought to date back to the last universal common ancestor are of high traceability. Their functions mostly involve catalysis, ion transport, and ribonucleoprotein complex assembly. In turn, the fraction of yeast genes whose traceability is not sufficient to infer their presence in last universal common ancestor is enriched for regulatory functions. Computing the traceabilities of genes that have been experimentally characterized as being essential for a self-replicating cell reveals that many of the genes that lack orthologs outside bacteria have low traceability. This leaves open whether their orthologs in the eukaryotic and archaeal domains have been overlooked. Looking at the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and nondetection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks. "protTrace," a software tool for computing evolutionary traceability, is freely available at https://github.com/BIONF/protTrace.git; last accessed February 10, 2019.

Keywords: LUCA; metabolic pathway; ortholog search; phylogenetic profile; sequence evolution; twilight zone.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
—Workflow to assess the evolutionary traceability of a protein. We show as examples two yeast proteins, PHD 1(blue) and DIM1 (yellow). For each seed protein, we use a simulation-based approach to infer its traceability, TI(t), that is defined on the interval [0, 1]. From its traceability graph and the evolutionary distance to any target species, the traceability index of the seed in the target species can be extracted. Relating this information to 1) a species tree highlights taxa where the ortholog search sensitivity becomes limiting (red clades), 2) phylogenetic profiles identifies cases where orthologs might have been overlooked, and 3) the gene ontology identifies molecular functions that coincide with low traceability.
<sc>Fig</sc>. 2.
Fig. 2.
—The evolutionary traceability of yeast proteins. (A) Traceability indices for 6,352 yeast proteins in Ashbya gossypii (Fungi), Encephalitozoon cuniculi (Microsporidia), Homo sapiens (Metazoa), Arabidopsis thaliana (Viridiplantae), Methanocaldococcus jannaschii (Archaea), and Escherichia coli (Bacteria). Proteins are ordered according to their traceability index in E. cuniculi. The inlay shows a stacked bar plot providing, for each species, the fraction of proteins in each of the four traceability bins. The color code identifying the individual species is specified in the phylogenetic tree. (B) Cumulative distribution of the detected yeast orthologs relative to the protein's traceability index. Of the detected orthologs, 95% coincide with a traceability index of 0.75 or above in the respective species (hatched line). (C) Relation between results of the ortholog search and protein traceability. (D) Per-species results with the color code following (C).
<sc>Fig</sc>. 3.
Fig. 3.
—Missing information about domain constraints results in underestimated traceabilities: the yeast mitochondrial inner membrane Mg2+ transporter MRS2. (A) MRS2 displays no significant hit against any Pfam domain and contains as sole features a central coiled-coil domain and two transmembrane domains. (B) The phylogenetic profile of MRS2 reveals the existence of orthologs across the entire eukaryotic kingdoms despite a predicted low traceability. The presence of an ortholog in a given species is indicated by a dot. The cell color represents protein traceability. (C) Section of the MRS2 alignment considering orthologs from different representatives across the eukaryotic tree of life. The selected region shows exemplarily for the entire alignment that MRS2 orthologs share conserved sequence motifs that most likely are associated with the functionality of this protein as an Mg2+ membrane transporter. As these conserved domains are not represented in a Pfam domain, protTrace cannot consider the corresponding evolutionary constraints during its simulation.
<sc>Fig</sc>. 4.
Fig. 4.
—Density plot of the TI(E. coli) for yeast proteins in dependence of their subcellular localization. Water-soluble intracellular proteins tend to have higher traceability indices in E. coli compared with proteins with a predicted extracellular localization, and to proteins localized in the cell membrane.
<sc>Fig</sc>. 5.
Fig. 5.
—Phylogenetic distribution and traceability profile for the Syn3.0 minimal gene set. The background color gives the information of the traceability index. The categorization according to the functional annotation status of the individual proteins was adapted from Hutchison et al. (2016).
<sc>Fig</sc>. 6.
Fig. 6.
—(A) Phylogenetic profiles for the components of fungal key metabolic pathways across ten representative species from the tree of life. The background color gives the information of traceability index ranging from green (high traceability) to red (low traceability). (B) The four proteins of the yeast cohesin complex form a ring-like structure. Font color of the protein names indicates that TI(t) in the microsporidium Encephalitozoon cuniculi is either 0.75 or higher (green), or below (red). (C) Maximum likelihood tree of REC8 and MCD1 (syn. SCC1) orthologs. The microsporidian REC8 candidates are colored in red. Branch labels represent percent bootstrap support. (D) Alternative phylogeny for the REC8/MCD1 (SCC1) protein family. It features monophyletic fungal REC8 and MCD1 (SCC1), respectively. The animal REC8 proteins are placed as sister to monophyletic fungal and microsporidian REC8 proteins. The branching orders in the fungal subtrees follow the accepted species phylogeny. The alternative tree is with a ΔLogLikelihood = 25.7 not significantly worse than the ML tree shown in (C) (Shimodaira–Hasegawa test: P > 0.05). The asterisk indicates a gene duplication on the microsporidian lineage that gave rise to the two paralogous microsporidian REC8 lineages.

Similar articles

Cited by

References

    1. Abascal F, Zardoya R, Posada D.. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105. - PubMed
    1. Alba MM, Castresana J.. 2005. Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol. 22(3):598–606. - PubMed
    1. Alba MM, Castresana J.. 2007. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol Biol. 7:53.. - PMC - PubMed
    1. Alderson P. 2004. Absence of evidence is not evidence of absence. BMJ 328(7438):476–477. - PMC - PubMed
    1. Altenhoff AM, et al. 2015. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 43(Database issue):D240–D249. - PMC - PubMed

Publication types