A Tailored Multivariate Mixture Model for Detecting Proteins of Concordant Change Among Virulent Strains of Clostridium Perfringens

J Am Stat Assoc. 2018;113(522):546-559. doi: 10.1080/01621459.2017.1356314. Epub 2018 Jun 12.


Necrotic enteritis (NE) is a serious disease of poultry caused by the bacterium C. perfringens. To identify proteins of C. perfringens that confer virulence with respect to NE, the protein secretions of four NE disease-producing strains and one baseline non-disease-producing strain of C. perfringens were examined. The problem then becomes a clustering task, for the identification of two extreme groups of proteins that were produced at either concordantly higher or concordantly lower levels across all four disease-producing strains compared to the baseline, when most of the proteins do not exhibit significant change across all strains. However, the existence of some nuisance proteins of discordant change may severely distort any biologically meaningful cluster pattern. We develop a tailored multivariate clustering approach to robustly identify the proteins of concordant change. Using a three-component normal mixture model as the skeleton, our approach incorporates several constraints to account for biological expectations and data characteristics. More importantly, we adopt a sparse mean-shift parameterization in the reference distribution, coupled with a regularized estimation approach, to flexibly accommodate proteins of discordant change. We explore the connections and differences between our approach and other robust clustering methods, and resolve the issue of unbounded likelihood under an eigenvalue-ratio condition. Simulation studies demonstrate the superior performance of our method compared with a number of alternative approaches. Our protein analysis along with further biological investigations may shed light on the discovery of the complete set of virulence factors in NE.

Keywords: Clustering; Multivariate mixture model; Penalized estimation; Proteomics; Robust estimation.