Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A

J Mol Evol. 2000 Nov;51(5):423-32. doi: 10.1007/s002390010105.


Algorithmic details to obtain maximum likelihood estimates of parameters on a large phylogeny are discussed. On a large tree, an efficient approach is to optimize branch lengths one at a time while updating parameters in the substitution model simultaneously. Codon substitution models that allow for variable nonsynonymous/synonymous rate ratios (omega = d(N)/d(S)) among sites are used to analyze a data set of human influenza virus type A hemagglutinin (HA) genes. The data set has 349 sequences. Methods for obtaining approximate estimates of branch lengths for codon models are explored, and the estimates are used to test for positive selection and to identify sites under selection. Compared with results obtained from the exact method estimating all parameters by maximum likelihood, the approximate methods produced reliable results. The analysis identified a number of sites in the viral gene under diversifying Darwinian selection and demonstrated the importance of including many sequences in the data in detecting positive selection at individual sites.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Evolution, Molecular*
  • Humans
  • Influenza A virus / genetics*
  • Likelihood Functions
  • Models, Genetic
  • Phylogeny*