Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 1;43(W1):W7-14.
doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

GUIDANCE2: Accurate Detection of Unreliable Alignment Regions Accounting for the Uncertainty of Multiple Parameters

Affiliations
Free PMC article

GUIDANCE2: Accurate Detection of Unreliable Alignment Regions Accounting for the Uncertainty of Multiple Parameters

Itamar Sela et al. Nucleic Acids Res. .
Free PMC article

Abstract

Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il.

Figures

Figure 1.
Figure 1.
Quantitative comparison of all MSA reliability algorithms for different data sets. (A) AUC-ROC and (B) AUC-PR. Performance curves of the five leading methodologies over the BAliBASE data set. (C) ROC and (D) precision–recall.
Figure 2.
Figure 2.
AUC ROC for columns as a function of gap percentage. (A) HOMSTRAD and (B) OrthoMaM simulations. MSAs were aligned using MAFFT.
Figure 3.
Figure 3.
ROC curve for the performance of each GUIDANCE2 component (gap opening penalty variation is denoted as gap penalty) in detecting unreliably aligned regions for (A) BAliBASE, (B) OrthoMaM simulations and (C) simulations of the ZORRO paper (the ZORRO simulated data set) are shown. AUC-ROC for each component is indicated in parentheses.

Similar articles

See all similar articles

Cited by 133 articles

See all "Cited by" articles

References

    1. Do C.B., Katoh K. Protein multiple sequence alignment. Methods Mol. Biol. 2008;484:379–413. - PubMed
    1. Do C.B., Mahabhashyam M.S., Brudno M., Batzoglou S. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15:330–340. - PMC - PubMed
    1. Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. - PMC - PubMed
    1. Katoh K., Misawa K., Kuma K., Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. - PMC - PubMed
    1. Katoh K., Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 2008;9:286–298. - PubMed

Publication types

Feedback