Background: Culture-independent phylogenetic analysis of 16S ribosomal RNA (rRNA) gene sequences has emerged as an incisive method of profiling bacteria present in a specimen. Currently, multiple techniques are available to enumerate the abundance of bacterial taxa in specimens, including the Sanger sequencing, the 'next generation' pyrosequencing, microarrays, quantitative PCR, and the rapidly emerging, third generation sequencing, and fourth generation sequencing methods. An efficient statistical tool is in urgent need for the followings tasks: (1) to compare the agreement between these measurement platforms, (2) to select the most reliable platform(s), and (3) to combine different platforms of complementary strengths, for a unified analysis.
Results: We present the latent variable structural equation modeling (SEM) as a novel statistical application for the comparative analysis of measurement platforms. The latent variable SEM model treats the true (unknown) relative frequency of a given bacterial taxon in a specimen as the latent (unobserved) variable and estimates the reliabilities of, and similarities between, different measurement platforms, and subsequently weighs those measurements optimally for a unified analysis of the microbiome composition. The latent variable SEM contains the repeated measures ANOVA (both the univariate and the multivariate models) as special cases and, as a more general and realistic modeling approach, yields superior goodness-of-fit and more reliable analysis results, as demonstrated by a microbiome study of the human inflammatory bowel diseases.
Conclusions: Given the rapid evolution of modern biotechnologies, the measurement platform comparison, selection and combination tasks are here to stay and to grow--and the latent variable SEM method is readily applicable to any other biological settings, aside from the microbiome study presented here.