FluReF, an automated flu virus reassortment finder based on phylogenetic trees

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2164-12-S2-S3. Epub 2011 Jul 27.

Abstract

Background: Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done "by hand" on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons.

Results: We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005-2008 and gave partial verification of reassortments found using another bioinformatics method.

Methods: FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades--groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees.

Conclusions: FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html.

MeSH terms

  • Algorithms*
  • Evolution, Molecular
  • Genome, Viral*
  • Influenza A Virus, H3N2 Subtype / classification*
  • Influenza A Virus, H3N2 Subtype / genetics
  • Models, Statistical
  • Phylogeny*
  • Point Mutation
  • Reassortant Viruses / classification*
  • Reassortant Viruses / genetics
  • Sequence Alignment
  • Software*