Implementation of a Stirling number estimator enables direct calculation of population genetics tests for large sequence datasets

Swaine L Chen

doi:10.1093/bioinformatics/bty1012

Implementation of a Stirling number estimator enables direct calculation of population genetics tests for large sequence datasets

Bioinformatics. 2019 Aug 1;35(15):2668-2670. doi: 10.1093/bioinformatics/bty1012.

Author

Swaine L Chen^{1

2}

Affiliations

¹ Division of Infectious Diseases, Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
² Infectious Diseases Group, Genome Institute of Singapore, Singapore, Singapore.

PMID: 30541067
DOI: 10.1093/bioinformatics/bty1012

Abstract

Motivation: Stirling numbers enter into the calculation of several population genetics statistics, including Fu's Fs. However, as alignments become large (≥50 sequences), the Stirling numbers required rapidly exceed the standard floating point range. Another recursive method for calculating Fu's Fs suffers from floating point underflow issues.

Results: I implemented an estimator for Stirling numbers that has the advantage of being uniformly applicable to the full parameter range for Stirling numbers. I used this to create a hybrid Fu's Fs calculator that accounts for floating point underflow. My new algorithm is hundreds of times faster than the recursive method. This algorithm now enables accurate calculation of statistics such as Fu's Fs for very large alignments.

Availability and implementation: An R implementation is available at http://github.com/swainechen/hfufs.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Genetics, Population*
Sequence Alignment
Software*