Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan;27(1):135-145.
doi: 10.1002/pro.3290. Epub 2017 Oct 30.

Clustal Omega for making accurate alignments of many protein sequences

Affiliations

Clustal Omega for making accurate alignments of many protein sequences

Fabian Sievers et al. Protein Sci. 2018 Jan.

Abstract

Clustal Omega is a widely used package for carrying out multiple sequence alignment. Here, we describe some recent additions to the package and benchmark some alternative ways of making alignments. These benchmarks are based on protein structure comparisons or predictions and include a recently described method based on secondary structure prediction. In general, Clustal Omega is fast enough to make very large alignments and the accuracy of protein alignments is high when compared to alternative packages. The package is freely available as executables or source code from www.clustal.org or can be run on-line from a variety of sites, especially the EBI www.ebi.ac.uk.

Keywords: benchmarking; clustal omega; multiple sequence alignment; protein structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance measures for different aligners/command‐lines for BAliBASE3. Left‐hand panels show sum‐of‐pairs (SP) score on top or total column (TC) score at bottom against execution time. Right‐hand panels show SP vs TC score. Clustal Omega data points are shown in red (default with a solid bullet, single linkage guide‐tree with a cross, maximum likelihood guide‐tree with a star, various iteration schemes with circles, itr1 and itr2 are single and double iterations). The remaining Clustal Omega data points correspond to options where guide‐tree and HMM iterations are performed a different number of times. For example, t2h1 performs two guide‐tree iterations and one HMM iteration. MAFFT data points are shown in blue (default mode with solid bullet, L‐INS‐i mode with triangle). MUSCLE data point is in green. The bottom‐right panel contains the same data points as the top‐right panel, with two extra data points (ClustalW2 and HMM over‐training) added
Figure 2
Figure 2
Performance measures for different aligners/command‐lines for Prefab, showing sum‐of‐pairs score versus execution time in seconds. The data point colors and symbols are the same as in Figure 1 (Clustal Omega red, MAFFT blue, MUSCLE green etc.). The main panel shows options without external HMM information. The small inset shows the same points as in the main panel with the one data point for external HMM added
Figure 3
Figure 3
Execution times for different aligners/options as number of input sequences is changed. The color scheme is the same as in Figures 1 and 2 (Clustal Omega red, MAFFT blue, MUSCLE green, ClustalW2 orange). Solid bullets are used for default options: blue triangle for MAFFT L‐INS‐i, blue box for MAFFT PartTree, green box for fast MUSCLE option, orange circles for ClustalW2. Error bars indicate times for short (bottom), medium (middle) and long (top) protein domains. Solid lines connecting middle points are used to guide the eye
Figure 4
Figure 4
Performance measures for different aligners/command‐lines for QuanTest. Top‐left panel shows sum‐of‐pairs (SP) score versus secondary structure prediction accuracy (SSPA). Bottom‐left panel shows total column (TC) score versus SSPA score. Top‐right panel shows SP score versus TC score. Bottom‐right panel shows TC score versus execution time. Colour scheme and symbol shapes are the same as in Figures 1 and 2
Figure 5
Figure 5
Resource requirements to align 20,000 p450 sequences, using default Clustal Omega. Top panels show utilization (out of 6 cores) when using six threads. Bottom panels show memory requirements. Left‐hand panels show requirements during initial alignment phase (pairwise distances and clustering), plotted against user time. Right‐hand panels show requirements during entire execution, plotted against wall clock time. Results for old Clustal Omega version 1.0.2 in blue, for current version 1.2.3 in red

Similar articles

Cited by

References

    1. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soeding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. - PMC - PubMed
    1. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. - PMC - PubMed
    1. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882. - PMC - PubMed
    1. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. - PMC - PubMed
    1. Katoh K, Kazuharu M, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. - PMC - PubMed

Publication types