Microbial communities contain unparalleled complexity, making them difficult to describe and compare. Characterizing this complexity will contribute to understanding the ecological processes that drive microbe-host interactions, bioremediation, and biogeochemistry. Moreover, an estimate of species richness will provide an indication of the completeness of a community profile. Such estimates are difficult, however, because community structure rarely fits a well-defined distribution. We present a model based on the word usage in books to illustrate the power of statistical tools in describing microbial communities and suggesting biological hypotheses. The model also generates data to test these methods when there are insufficient data in the literature. For example, by simulating the word distribution in books, we can predict the number of words that must be read to estimate the size of the vocabulary used to write the book. Combined with other models that have been used to make inaccessible problems tractable, our book model offers a unique approach to the complex problem of describing microbial diversity.