Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Feb 25:9:17-28.
doi: 10.4137/GEI.S37925. eCollection 2016.

Inferring Orthologs: Open Questions and Perspectives

Affiliations
Review

Inferring Orthologs: Open Questions and Perspectives

Fredj Tekaia. Genomics Insights. .

Abstract

With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.

Keywords: HGT; evolutionary processes; genome annotation quality; genome trees; multidomains; phylogeny; synteny; taxon sampling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Homologs—paralogs—orthologs. Notes: This figure illustrates speciation and duplication events and their resulting consequences on gene terminology. The figure shows:
  1. an intraspecies duplication of gene g giving rise to two genes g1 and g2 (note that g is no more visible in species S);

  2. a speciation event giving rise to two species A and B with identical contents as S; in particular, g1 and g2 are denoted as g1a and g2a in A and g1b and g2b in B;

  3. we assume that in B, g2b is duplicated and gives rise to g2b1 and g2b2; (Note that g2b is no more visible in B).

In this scheme and considering solely the last speciation event:
  1. – g1 and g2 are homologs because they descend from g. Similarly, g1a and g1b are homologs because they descend from g1;

  2. – g1 and g2 are in-paralogs, because they are duplicated in S;

  3. – Similarly, g2b1 and g2b2 are in-paralogs because they are duplicated in B;

  4. – g1a and g2a are out-paralogs because their ancestors are duplicated in S;

  5. – Similarly, g1b and each of g2b1 and g2b2 are out-paralogs, because their ancestors are duplicated in S;

  6. – g1a and g1b are orthologs because they are in distinct species A and B, respectively, with a common ancestor g1;

  7. – g2a and g2b1 and g2a and g2b2 are orthologs because they are in distinct species A and B, respectively, with the same ancestor g2. g2b1 and g2b2 are also called co-orthologs to g2a.

Dashed arrows with different colors highlight pairs of orthologs, out-paralogs, and in-paralogs.
Figure 2
Figure 2
Evolutionary processes. Notes: This figure illustrates some significant evolutionary processes as revealed by large-scale comparative analyses of predicted proteomes: phylogeny, expansion, exchange, and reduction. Phylogeny is the direct descent from ancestor to actual genome. Expansion (in red) includes gene duplication, segmental and whole-genome duplication, and genesis. Exchange (in blue) includes mainly HGT and introgression. Reduction is represented by gene loss. Rearrangements include inversions, translocations, fusion, and fissions.
Figure 3
Figure 3
Example of a misleading situation in orthology inference. Notes: A species S is shown including a gene g that has been duplicated (Gd) into g1 and g2. A speciation event (Sp) gave rise to two species S1 and S2, followed by a duplication (Gd) solely in S2 of g1 (resulting in g1a and g1b) and of g2 (resulting in g2a and g2b). The neighboring genes g0 and g3 are conserved. If genes g1 in S1 and g2a and g2b in S2 are lost, most similarity and phylogenetic methods for orthology detection will assign erroneously orthology to g2, g1a, and g2b. Indeed, these are not orthologous, because g2, g1a, and g2b do not result from the same ancestral gene after the speciation event. Conservation of their neighboring genes and synteny may help to suspect speciation and gene duplication events and therefore conclude for the nonorthology of these genes.
Figure 4
Figure 4
Assessment of members of orthologs in an SPO cluster by detecting motifs and their distribution. Notes: Motifs in SPOs are illustrated with the example of SPO29.1, from the considered 12 mycobacterial species. This SPO contains proteins corresponding to mapA and mapB (methionine aminopeptidase). Column headings are as follows: (a) SpecCode_ProtID: species code (see coding conventions below) followed by the protein identification; (b) Partition_RBH: partition of RBHs in pairwise proteome comparisons of considered species) denoted Pl.r where l is the number of proteins in the partition and r is an arbitrary index; (c) paralogs: paralogous class Pn.m is a partition of intraspecies RBHs and Cp.q is the cluster obtained by the mcl programme (see Ref. for more details on the coding scheme of Pn.m.Cp.q classes); and (d) motifs: distributions of motifs as obtained with the meme/mast programs. The distributions highlight motifs shared by all proteins (ancestral motifs: 3,6,2,4) and motifs shared by subsets of proteins. Checking of the detailed description of paralogs allowed adding the last line (MYSM_MSMMEG5683) because only three from the P10.11.C4.47 cluster were found by the RBH procedure. [Table: see text]

Similar articles

Cited by

References

    1. Fleischmann RD, Adams MD, White O, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. - PubMed
    1. Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A. 1999;96:3801–3806. - PMC - PubMed
    1. Choi IG, Kim SH. Global extent of horizontal gene transfer. Proc Natl Acad Sci U S A. 2007;104:4489–4494. - PMC - PubMed
    1. Wolfe KH, Shields DC. Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997;387:708–713. - PubMed
    1. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. - PubMed

LinkOut - more resources