An improved nonparametric lower bound of species richness via a modified good-turing frequency formula

Biometrics. 2014 Sep;70(3):671-82. doi: 10.1111/biom.12200. Epub 2014 Jun 19.


It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.

Keywords: Abundance data; Good–Turing frequency formula; Incidence data; Species richness.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Biometry / methods
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Demography / methods*
  • Epidemiologic Methods
  • Humans
  • Models, Statistical*
  • Population Dynamics*
  • Sample Size*
  • Statistics, Nonparametric*