Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets

Plant Cell. 2011 Sep;23(9):3101-16. doi: 10.1105/tpc.111.088153. Epub 2011 Sep 6.

Abstract

The meta-analysis of large-scale postgenomics data sets within public databases promises to provide important novel biological knowledge. Statistical approaches including correlation analyses in coexpression studies of gene expression have emerged as tools to elucidate gene function using these data sets. Here, we present a powerful and novel alternative methodology to computationally identify functional relationships between genes from microarray data sets using rule-based machine learning. This approach, termed "coprediction," is based on the collective ability of groups of genes co-occurring within rules to accurately predict the developmental outcome of a biological system. We demonstrate the utility of coprediction as a powerful analytical tool using publicly available microarray data generated exclusively from Arabidopsis thaliana seeds to compute a functional gene interaction network, termed Seed Co-Prediction Network (SCoPNet). SCoPNet predicts functional associations between genes acting in the same developmental and signal transduction pathways irrespective of the similarity in their respective gene expression patterns. Using SCoPNet, we identified four novel regulators of seed germination (ALTERED SEED GERMINATION5, 6, 7, and 8), and predicted interactions at the level of transcript abundance between these novel and previously described factors influencing Arabidopsis seed germination. An online Web tool to query SCoPNet has been developed as a community resource to dissect seed biology and is available at http://www.vseed.nottingham.ac.uk/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arabidopsis / genetics*
  • Artificial Intelligence*
  • Computational Biology*
  • Gene Expression Regulation, Plant
  • Gene Regulatory Networks
  • Germination / genetics*
  • Internet
  • Likelihood Functions
  • Oligonucleotide Array Sequence Analysis
  • Seeds / genetics
  • Seeds / growth & development
  • Transcriptome*