Constrained mixture estimation for analysis and robust classification of clinical time series

Ivan G Costa; Alexander Schönhuth; Christoph Hafemeister; Alexander Schliep

doi:10.1093/bioinformatics/btp222

Constrained mixture estimation for analysis and robust classification of clinical time series

Bioinformatics. 2009 Jun 15;25(12):i6-14. doi: 10.1093/bioinformatics/btp222.

Authors

Ivan G Costa¹, Alexander Schönhuth, Christoph Hafemeister, Alexander Schliep

Affiliation

¹ Center of Informatics, Federal University of Pernambuco, Recife, Brazil. igcf@cin.ufpe.br

Abstract

Motivation: Personalized medicine based on molecular aspects of diseases, such as gene expression profiling, has become increasingly popular. However, one faces multiple challenges when analyzing clinical gene expression data; most of the well-known theoretical issues such as high dimension of feature spaces versus few examples, noise and missing data apply. Special care is needed when designing classification procedures that support personalized diagnosis and choice of treatment. Here, we particularly focus on classification of interferon-beta (IFNbeta) treatment response in Multiple Sclerosis (MS) patients which has attracted substantial attention in the recent past. Half of the patients remain unaffected by IFNbeta treatment, which is still the standard. For them the treatment should be timely ceased to mitigate the side effects.

Results: We propose constrained estimation of mixtures of hidden Markov models as a methodology to classify patient response to IFNbeta treatment. The advantages of our approach are that it takes the temporal nature of the data into account and its robustness with respect to noise, missing data and mislabeled samples. Moreover, mixture estimation enables to explore the presence of response sub-groups of patients on the transcriptional level. We clearly outperformed all prior approaches in terms of prediction accuracy, raising it, for the first time, >90%. Additionally, we were able to identify potentially mislabeled samples and to sub-divide the good responders into two sub-groups that exhibited different transcriptional response programs. This is supported by recent findings on MS pathology and therefore may raise interesting clinical follow-up questions.

Availability: The method is implemented in the GQL framework and is available at http://www.ghmm.org/gql. Datasets are available at http://www.cin.ufpe.br/ approximately igcf/MSConst.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Classification / methods
Computational Biology / methods*
Gene Expression Profiling / methods*
Humans
Interferon-beta / chemistry
Interferon-beta / pharmacology
Markov Chains
Multiple Sclerosis / genetics
Multiple Sclerosis / metabolism

Substances

Interferon-beta