Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;5(2):457-67.
doi: 10.1093/gbe/evt017.

Improving Genome-Wide Scans of Positive Selection by Using Protein Isoforms of Similar Length

Affiliations
Free PMC article

Improving Genome-Wide Scans of Positive Selection by Using Protein Isoforms of Similar Length

José Luis Villanueva-Cañas et al. Genome Biol Evol. .
Free PMC article

Abstract

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank(+F). Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Number of protein isoform combinations in different data sets. See table 1 for a description of the data sets.
F<sc>ig</sc>. 2.—
Fig. 2.—
Schematic representation of protein isoform combination selection by Longest and PALO. Hypothetical gene family with four possible protein isoform combinations.
F<sc>ig</sc>. 3.—
Fig. 3.—
Example analysis of a gene family using Longest and PALO. The example shown is ENSEMBL gene ENSG00000003096 (brain Kelch-like protein 13), associated with 66 possible protein isoform combinations in 1:1 orthologs from human, mouse, horse, and cow. The first 70 positions of the alignments using the Longest (above) and PALO (below) methods are shown. Longest selects a human protein isoform that shows an extension at the N-terminus. Misalignment of this region results in inflated dN/dS values at the level of the whole protein and an artifactual signal of positive selection (PS) in the human branch.

Similar articles

See all similar articles

Cited by 19 articles

See all "Cited by" articles

References

    1. Albà MM, Castresana J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol. 2005;22:598–606. - PubMed
    1. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 2007;24:1219–1228. - PubMed
    1. Arbiza L, Dopazo J, Dopazo H. Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome. PLoS Comput Biol. 2006;2:e38. - PMC - PubMed
    1. Bakewell MA, Shi P, Zhang J. More genes underwent positive selection in chimpanzee evolution than in human evolution. Proc Natl Acad Sci U S A. 2007;104:7489–7494. - PMC - PubMed
    1. Carneiro M, et al. Evidence for widespread positive and purifying selection across the European rabbit (Oryctolagus cuniculus) genome. Mol Biol Evol. 2012;29:1837–1849. - PMC - PubMed

Substances

Feedback