Multi-platform gene-expression mining and marker gene analysis

Int J Data Min Bioinform. 2011;5(5):485-503. doi: 10.1504/ijdmb.2011.043030.


Gene-expression data are now widely available and used for a wide range of clinical and diagnostic purposes. A key challenge is to select a few significant marker genes for biological studies. While it is feasible to find important genes from a single gene-expression data set, it is often more meaningful to compare the results from different but related data sets together, especially for multiple gene-expression data sets arising from different studies of a common organism or phenotype. In this paper, we present a novel framework to exploit the commonalities across different data sets by jointly learning from different data sets simultaneously through multi-task feature learning. By identifying a common subspace of genes, we can help biologists find important marker genes that span different evolutionary periods in the life cycle of cancer development. The genes thus found are more stable and more significant. Our experimental results demonstrate that more accurate models can be built using multiple data sets based on fewer labelled examples. To the best of our knowledge, we are among the first to introduce multi-task learning in the bioinformatics community to solve the lack of data problem.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods*
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Gene Expression*
  • Genetic Markers*
  • Neoplasms / diagnosis
  • Neoplasms / genetics


  • Genetic Markers