High-throughput sequencing of amplicon libraries is the most widespread and one of the most effective ways to study the taxonomic structure of microbial communities, even despite growing accessibility of whole metagenome sequencing. Due to the targeted amplification, the method provides unparalleled resolution of communities, but at the same time perturbs initial community structure thereby reducing data robustness and compromising downstream analyses. Experimental research of the perturbations is largely limited to comparative studies on different PCR protocols without considering other sources of experimental variation related to characteristics of the initial microbial composition itself. Here we analyse these sources and demonstrate how dramatically they effect the relative abundances of taxa during the PCR cycles. We developed the mathematical model of the PCR amplification assuming the heterogeneity of amplification efficiencies and considering the compositional nature of data. We designed the experiment-five consecutive amplicon cycles (22-26) with 12 replicates for one real human stool microbial sample-and estimated the dynamics of the microbial community in line with the model. We found the high heterogeneity in amplicon efficiencies of taxa that leads to the non-linear and substantial (up to fivefold) changes in relative abundances during PCR. The analysis of possible sources of heterogeneity revealed the significant association between amplicon efficiencies and the energy of secondary structures of the DNA templates. The result of our work highlights non-trivial changes in the dynamics of real-life microbial communities due to their compositional nature. Obtained effects are specific not only for amplicon libraries, but also for any studies of metagenome dynamics.
Keywords: Amplification bias; Bayesian inference; Compositional data analysis; High-throughput sequencing; Microbiome; PCR.
©2022 Korvigo et al.