Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 28 (2), 209-222

Defining Predictive Probability Functions for Species Sampling Models

Affiliations

Defining Predictive Probability Functions for Species Sampling Models

Jaeyong Lee et al. Stat Sci.

Abstract

We review the class of species sampling models (SSM). In particular, we investigate the relation between the exchangeable partition probability function (EPPF) and the predictive probability function (PPF). It is straightforward to define a PPF from an EPPF, but the converse is not necessarily true. In this paper we introduce the notion of putative PPFs and show novel conditions for a putative PPF to define an EPPF. We show that all possible PPFs in a certain class have to define (unnormalized) probabilities for cluster membership that are linear in cluster size. We give a new necessary and sufficient condition for arbitrary putative PPFs to define an EPPF. Finally, we show posterior inference for a large class of SSMs with a PPF that is not linear in cluster size and discuss a numerical method to derive its PPF.

Keywords: Species sampling prior; exchangeable partition probability functions; prediction probability functions.

Figures

Fig. 1
Fig. 1
The lines in each panel show 10 draws P ~ p(P) for the DP (left) and for the SSM defined in (16) below (right). The Ph are defined for integers h only. We connect them to a line for presentation only. Also, for better presentation we plot the sorted weights. The thick line shows the prior mean. For comparison, a dashed thick line plots the prior mean of the unsorted weights. Under the DP the sorted and unsorted prior means are almost indistinguishable.
Fig. 2
Fig. 2
Panel (a) shows the PPF (19) for a random probability measure G ~ SSM(p, ν), with Ph as in (16). The thick line plots p (sn+1 = j | s) against nj, averaging over multiple simulations. In each simulation we used the same simulation truth to generate s and stop simulation at n = 100. The 10 thin lines show pj (n) for 10 simulations with different n. In contrast, under the DP Polya urn the curve is a straight line and there is no variation across simulations [panel (b)].
Fig. 3
Fig. 3
Posterior estimated sampling model F̅ = E (F | data) = p (yn+1 | data) under the SSM(p, ν) prior and a comparable DP prior. The triangles along the x-axis show the data.
Fig. 4
Fig. 4
Co-clustering probabilities p (si = sj | data) under the two prior models.
Fig. 5
Fig. 5
Posterior probabilities of pairwise co-clustering, pij = p(si = sj | y). The grey scales in the two panels are scaled as black for pij = 0 to white for pij = maxr,s prs. The maxima are indicated in the right top of the plots.
Fig. 6
Fig. 6
Posterior distribution on the number of clusters.
Fig. 7
Fig. 7
Posterior distribution on the size of the largest cluster.

Similar articles

See all similar articles

Cited by 7 PubMed Central articles

See all "Cited by" articles

LinkOut - more resources

Feedback