Methods are presented for assessing and comparing the results of k > or = 2 independent samples of measured agreement or concordance, where in each sample a given member of a pair of observations is classified according to the presence or absence of a binary trait. Examples include the assessment of interobserver agreement across different groups of patients in a clinical study, investigations of sibling concordance across different genetic groups, and meta-analyses of observer agreement across different studies. The methodology described is based on application of goodness-of-fit theory to testing hypotheses concerning kappa statistics. Partitioning methods allow a variety of hypotheses to be tested, including an assessment of the degree of agreement within each sample, a testing procedure based on the pooled data, and a test of heterogeneity that may be used to assess the validity of pooling across samples. Three examples are given.