Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty

PeerJ. 2022 Aug 31;10:e13843. doi: 10.7717/peerj.13843. eCollection 2022.

Abstract

Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.

Keywords: Functional divergence; Nonsynonymous to synonymous substitutions; Orthologs; Paralogs; Positive selection; dN/dS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Duplication
  • Genomics* / methods
  • Proteins*

Substances

  • Proteins

Grant support

Work supported with a Discovery Grant to Gabriel Moreno-Hagelsieb from the Natural Sciences and Engineering Research Council of Canada (NSERC). This work was also supported by the Programa de Apoyo a Proyectos de Investigacion e Innovacion Tecnologica (PAPIIT-UNAM) (IN205918 and IN202421) to Julio A. Freyre-González. Juan M. Escorcia-Rodríguez is supported by PhD fellowship 959406 from Consejo Nacional de Ciencia y Tecnología (CONACyT-Mexico). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.