Implementation of a Stirling number estimator enables direct calculation of population genetics tests for large sequence datasets

Bioinformatics. 2019 Aug 1;35(15):2668-2670. doi: 10.1093/bioinformatics/bty1012.

Abstract

Motivation: Stirling numbers enter into the calculation of several population genetics statistics, including Fu's Fs. However, as alignments become large (≥50 sequences), the Stirling numbers required rapidly exceed the standard floating point range. Another recursive method for calculating Fu's Fs suffers from floating point underflow issues.

Results: I implemented an estimator for Stirling numbers that has the advantage of being uniformly applicable to the full parameter range for Stirling numbers. I used this to create a hybrid Fu's Fs calculator that accounts for floating point underflow. My new algorithm is hundreds of times faster than the recursive method. This algorithm now enables accurate calculation of statistics such as Fu's Fs for very large alignments.

Availability and implementation: An R implementation is available at http://github.com/swainechen/hfufs.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genetics, Population*
  • Sequence Alignment
  • Software*