Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;28(10):2731-9.
doi: 10.1093/molbev/msr121. Epub 2011 May 4.

MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

Affiliations
Free PMC article

MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

Koichiro Tamura et al. Mol Biol Evol. .
Free PMC article

Abstract

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

Figures

F<sc>IG</sc>. 1.
FIG. 1.
Evaluating the fit of substitution models in MEGA5. (A) The “Models” menu in the “Action Bar” provides access to the facility. (B) An “Analysis Preferences” dialog box provides the user with an array of choices, including the choice of tree to use and the method to treat missing data and alignment gaps. In addition to the “Complete Deletion” and “Pairwise Deletion” options, MEGA5 now includes a “Partial Deletion” option that enables users to exclude positions if they have less than a desired percentage (x%) of site coverage, that is, no more than (100−x)% sequences at a site are allowed to have an alignment gap, missing datum, or ambiguous base/amino acid. For protein coding nucleotide sequences, users can choose to analyze nucleotide or translated amino acid substitutions, with a choice of codon positions in the former. (C) The list of evaluated substitution models along with their relative fits, number of parameters (branch lengths + model parameters), and estimates of evolutionary parameters for Drosophila Adh sequence data which are available in the Examples directory in MEGA5 installation. The note below the table provides a brief description of the results (e.g., ranking of models by BIC), data subset selected, and the analysis option chosen. This figure is available in color online and in black and white in print.
F<sc>IG</sc>. 2.
FIG. 2.
Comparison of the best-fit model identified by using automatically generated and true trees for 1,792 computer simulated 66-sequence data sets. (A) The percentage of datasets for which the use of an automatically generated tree produces the same best-fit model as does the use of the true tree. Results are shown from datasets simulated with four different values of the gamma parameter (α) for rate variation among sites. (B) The estimates of α when using the automatically generated trees (filled bars) and the true tree (open bars). The average α and ±1 standard deviation are depicted on each bar; 10 discrete Gamma categories were used. (C) The relationship of true and estimated transition–transversion ratio, R, when using automatically generated trees for data simulated with α = 0.25. The value of R becomes 0.5 when the transition–transversion rate ratio, κ, is 1.0 in Kimura's two-parameter model. The slope of the linear regression was 1.005, with the intercept passing through the origin (r2 = 0.98). Using the true tree, slope and r2 values were 1.007 and 0.98, respectively. The absolute average difference between the two sets of estimates was 0.2% (maximum difference = 5.2%). Similar results were obtained for data simulated with α = 0.5, 1.0, and 2.0.
F<sc>IG</sc>. 3.
FIG. 3.
Comparison of the computational speed of ML heuristic searches. (A) Average time taken to complete MEGA5 (NNI and CNI), RaxML7 (G and MIX), and PhyML3 (NNI and SPR) heuristic searches for 1,792 simulated data sets containing 66 sequences each. Bars are shown with ±1 standard deviation. Three data sets were excluded from PhyML3 calculations, as the NNI search failed. (B, C) Scatter plots showing the time taken to search for the ML tree for alignments that contain 20–200 and 200–765 sequences of 2,000 base pairs. The power trend fits are indicated for PhyML3 and MEGA5 (r2 > 0.98 in all cases). For direct comparisons, all analyses were conducted by using 4 discrete categories for the Gamma distribution and a GTR model of nucleotide substitution (see Materials and Methods for simulation procedures, analysis descriptions, and computer hardware used). G, GTRGAMMA with four discrete Gamma categories; MIX, mixed method of using CAT and GAMMA models.
F<sc>IG</sc>. 4.
FIG. 4.
Accuracies of heuristic ML trees produced by MEGA5, RaxML7, and PhyML3 programs. Shown are the proportions of interior branches (tree partitions) inferred correctly, along with ±1 standard deviation, for simulated data sets containing (A) 66 sequences and (B) 765 sequences. G, GTRGAMMA with four discrete Gamma categories; MIX, mixed method of using CAT and GAMMA models.
F<sc>IG</sc>. 5.
FIG. 5.
Position-specific inferred ancestral states in a primate opsin phylogeny and the posterior probabilities of alternative amino acids at that position. See MEGA5 Examples directory for the data file and Nei and Kumar (2000, p. 212–213) for a description of the data. This figure is available in color online and in black and white in print.
F<sc>IG</sc>. 6.
FIG. 6.
The MEGA5 “Action Bar” and associated action menus. This figure is available in color online and in black and white in print.

Similar articles

See all similar articles

Cited by 11,996 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback