MPrESS: An R-Package for Accurately Predicting Power for Comparisons of 16S rRNA Microbiome Taxa Distributions including Simulation by Dirichlet Mixture Modeling

Microorganisms. 2023 Apr 29;11(5):1166. doi: 10.3390/microorganisms11051166.


Deep sequencing has revealed that the 16S rRNA gene composition of the human microbiome can vary between populations. However, when existing data are insufficient to address the desired study questions due to limited sample sizes, Dirichlet mixture modeling (DMM) can simulate 16S rRNA gene predictions from experimental microbiome data. We examined the extent to which simulated 16S rRNA gene microbiome data can accurately reflect the diversity within that identified from experimental data and calculate the power. Even when experimental and simulated datasets differed by less than 10%, simulation by DMM consistently overestimates power, except when using only highly discriminating taxa. Admixtures of DMM with experimental data performed poorly compared to pure simulation and did not show the same correlation with experimental data p-value and power values. While multiple replications of random sampling remain the favored method of determining the power, when the estimated sample size required to achieve a certain power exceeds the sample number, then simulated samples based on DMM can be used. We introduce an R-Package, MPrESS, to assist in power calculation and sample size estimation for a 16S rRNA gene microbiome dataset to detect a difference between populations. MPrESS can be downloaded from GitHub.

Keywords: 16S rRNA gene sequencing; dirichlet mixture modeling; forensics; human microbiome; power calculations; sample size estimates.