Segmentation plays a critical role in exposing connections between biological structure and function. The process of label fusion collects and combines multiple observations into a single estimate. Statistically driven techniques provide mechanisms to optimally combine segmentations; yet, optimality hinges upon accurate modeling of rater behavior. Traditional approaches, e.g., Majority Vote and Simultaneous Truth and Performance Level Estimation (STAPLE), have been shown to yield excellent performance in some cases, but do not account for spatial dependences of rater performance (i.e., regional task difficulty). Recently, the COnsensus Level, Labeler Accuracy and Truth Estimation (COLLATE) label fusion technique augmented the seminal STAPLE approach to simultaneously estimate regions of relative consensus versus confusion along with rater performance. Herein, we extend the COLLATE framework to account for multiple consensus levels. Toward this end, we posit a generalized model of rater behavior of which Majority Vote, STAPLE, STAPLE Ignoring Consensus Voxels, and COLLATE are special cases. The new algorithm is evaluated with simulations and shown to yield improved performance in cases with complex region difficulties. Multi-COLLATE achieve these results by capturing different consensus levels. The potential impacts and applications of generative model to label fusion problems are discussed.