PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes

Nucleic Acids Res. 2017 Jun 20;45(11):e100. doi: 10.1093/nar/gkx179.

Abstract

Many comparative genomics studies aim to find the genetic basis of species-specific phenotypic traits. A prevailing strategy is to search genome-wide for genes that evolved under positive selection based on the non-synonymous to synonymous substitution ratio. However, incongruent results largely due to high false positive rates indicate the need for standardization of quality criteria and software tools. Main challenges are the ortholog and isoform assignment, the high sensitivity of the statistical models to alignment errors and the imperative to parallelize large parts of the software. We developed the software tool PosiGene that (i) detects positively selected genes (PSGs) on genome-scale, (ii) allows analysis of specific evolutionary branches, (iii) can be used in arbitrary species contexts and (iv) offers visualization of the results for further manual validation and biological interpretation. We exemplify PosiGene's performance using simulated and real data. In the simulated data approach, we determined a false positive rate <1%. With real data, we found that 68.4% of the PSGs detected by PosiGene, were shared by at least one previous study that used the same set of species. PosiGene is a user-friendly, reliable tool for reproducible genome-wide identification of PSGs and freely available at https://github.com/gengit/PosiGene.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Conserved Sequence
  • Evolution, Molecular
  • Genome
  • Humans
  • Phylogeny
  • Selection, Genetic*
  • Sequence Analysis, DNA*
  • Software*