Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3

Mol Biol Evol. 2013 Aug;30(8):1987-97. doi: 10.1093/molbev/mst100. Epub 2013 May 24.


Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

Keywords: adaptive evolution; duplication; gene family.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Evolution, Molecular
  • Genome*
  • Genomics / methods
  • Molecular Sequence Annotation / methods*
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*
  • Software*