Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 1;34(13):2297-2299.
doi: 10.1093/bioinformatics/bty101.

SubRecon: Ancestral Reconstruction of Amino Acid Substitutions Along a Branch in a Phylogeny

Free PMC article

SubRecon: Ancestral Reconstruction of Amino Acid Substitutions Along a Branch in a Phylogeny

Christopher Monit et al. Bioinformatics. .
Free PMC article


Summary: Existing ancestral sequence reconstruction techniques are ill-suited to investigating substitutions on a single branch of interest. We present SubRecon, an implementation of a hybrid technique integrating joint and marginal reconstruction for protein sequence data. SubRecon calculates the joint probability of states at adjacent internal nodes in a phylogeny, i.e. how the state has changed along a branch. This does not condition on states at other internal nodes and includes site rate variation. Simulation experiments show the technique to be accurate and powerful. SubRecon has a user-friendly command line interface and produces concise output that is intuitive yet suitable for subsequent parsing in an automated pipeline.

Availability and implementation: SubRecon is platform independent, requiring Java v1.8 or above. Source code, installation instructions and an example dataset are freely available under the Apache 2.0 license at


Fig. 1.
Fig. 1.
Accuracy of SubRecon over a range of dataset sizes and branch lengths. We first estimated a phylogeny, π and α for 19 primate lysozyme protein sequences (Messier and Stewart 1997), using RAxML and WAG substitution model with 4 gamma-distributed rate categories. We then simulated evolution of 1000 sites using WAG and these parameters using Evolver (Yang, 2007), with input branch lengths multiplied by 1, 5 or 10. (A) The number of sites where SubRecon’s max[P(A=a,B=b|D,θ,α)] estimated both a and b correctly (upper bars) or incorrectly (lower bars) using a range of minimum probability thresholds, for the branch ancestral to the Colobines (N = 5). (B–D) Further simulations used arbitrary bifurcating topologies containing 64, 256 or 1024 taxa with equal branch lengths, chosen such that their sum per taxon was equal to that of primate lysozyme (0.392/190.02) and then multiplied by 1, 5 or 10. The chosen branch of interest was that ancestral to 25% of taxa

Similar articles

See all similar articles


    1. Drummond A., Strimmer K. (2001) PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics, 17, 662–663. - PubMed
    1. Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368–376. - PubMed
    1. Koshi J.M., Goldstein R.A. (1996) Probabilistic reconstruction of ancestral protein sequences. J. Mol. Evol., 42, 313–320. - PubMed
    1. Messier W., Stewart C.-B. (1997) Episodic adaptive evolution of primate lysozymes. Nature, 385, 151–154. - PubMed
    1. Pupko T. et al. (2000) A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol. Biol. Evol., 17, 890–896. - PubMed

Publication types