Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 23 (9), 3101-16

Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets

Affiliations

Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets

George W Bassel et al. Plant Cell.

Abstract

The meta-analysis of large-scale postgenomics data sets within public databases promises to provide important novel biological knowledge. Statistical approaches including correlation analyses in coexpression studies of gene expression have emerged as tools to elucidate gene function using these data sets. Here, we present a powerful and novel alternative methodology to computationally identify functional relationships between genes from microarray data sets using rule-based machine learning. This approach, termed "coprediction," is based on the collective ability of groups of genes co-occurring within rules to accurately predict the developmental outcome of a biological system. We demonstrate the utility of coprediction as a powerful analytical tool using publicly available microarray data generated exclusively from Arabidopsis thaliana seeds to compute a functional gene interaction network, termed Seed Co-Prediction Network (SCoPNet). SCoPNet predicts functional associations between genes acting in the same developmental and signal transduction pathways irrespective of the similarity in their respective gene expression patterns. Using SCoPNet, we identified four novel regulators of seed germination (ALTERED SEED GERMINATION5, 6, 7, and 8), and predicted interactions at the level of transcript abundance between these novel and previously described factors influencing Arabidopsis seed germination. An online Web tool to query SCoPNet has been developed as a community resource to dissect seed biology and is available at http://www.vseed.nottingham.ac.uk/.

Figures

Figure 1.
Figure 1.
Generation of a Rule-Based ML Coprediction Network Based on Arabidopsis Seed Microarray Data. (A) An example rule and two example rule sets predicting the germination and nongermination developmental outcomes in Arabidopsis seeds. The example rule represents the first rule within the example germination rule set. Within each rule is an Arabidopsis gene identifier followed by the > operator followed by a number, representing a gene expression level. (B) Pipeline used to generate the coprediction functional gene network based on rules produced through rule-based ML. The associated software can be downloaded at www.vseed.nottingham.ac.uk.
Figure 2.
Figure 2.
Properties and Topologies of SCoPNet and Comparison with SeedNet. (A) Organic network topology of SCoPNet. Node color is based on gene lists of significantly differentially regulated transcripts in nongeminating (SAM NG, red nodes) and germinating (SAM G, blue nodes) seeds. Gray nodes represent genes not statistically associated with either germination or nongermination. Node sizes in (A), (B), (C), and (E) correspond to node degree. (B) Distribution of nodes and edges appearing with an increased frequency in nongermination predicting rule sets within SCoPNet. Nodes with increasing nongermination node strength are colored with darker shades of red and edges representing an increasing frequency of co-occurrence between gene pairs in nongermination rule sets with a darker shade of blue. (C) Distribution of nodes and edges appearing with an increased frequency in germination predicting rule sets within SCoPNet. Nodes with increasing germination node strength are colored with darker shades of red and edges representing an increasing frequency of co-occurrence between gene pairs in germination rule sets with a darker shade of blue. (D) Plot of nongermination and germination node scores along a linear ordering of genes starting from the highest to lowest node score for each set of predictions. The highest 100 node scoring genes for each developmental state are plotted on the graph. (E) Distribution of nodes with the greatest degree within SCoPNet. The darker the shade of red, the higher the degree of the node. (F) Intersection between SCoPNet and the coexpression network SeedNet. Only clusters with at least two common edges between networks are shown. Red nodes are genes associated with the nongerminating state (SAM NG), blue nodes are associated with the germinating state (SAM G), and gray nodes are not associated with either state. (G) Distribution of the top 100 nongermination node and germination node scoring genes in the gene coexpression network SeedNet. Nongermination predicted nodes are colored red and germination predicted nodes blue.
Figure 3.
Figure 3.
Significantly Represented GO Biological Process Categories within the Nongermination and Germination Domains of SCoPNet. (A) Significant GO categories within the nongermination domain of SCoPNet. (B) Significant GO categories within the germination domain of SCoPNet. A greater node size indicates more genes within a given GO category. Node color indicates the P value significance using the scale from yellow to orange in the bottom left of (A) and (B). A threshold of P < 0.05 was used to identify significant GO categories.
Figure 4.
Figure 4.
Phenotypic Characterization of Newly Identified Regulators of Seed Germination. (A) asg5-1 and asg5-2 mutant seeds on increasing concentrations of the germination inhibiting hormone ABA relative to their wild-type equivalent Columbia-0. (B) asg5-1 and asg5-2 mutant seeds on increasing concentrations of the GA synthesis inhibiting compound PAC. (C) Same as (A) with asg6-1 mutant seeds. (D) Same as (B) with asg6-1 mutant seeds. (E) Same as (A) with asg7-1 mutant seeds. (F) Same as (B) with asg7-1 mutant seeds. (G) Same as (A) with asg8-1 mutant seeds. (H) Same as (B) with asg8-1 mutant seeds. All seeds were stratified at 4°C for 2 d, and graphs indicate the final percentage of following 7 d of incubation at 22°C. [See online article for color version of this figure.]
Figure 5.
Figure 5.
Associations between Known and Newly Identified Regulators in the Rule-Based ML Network. (A) Associations between newly uncovered and previously identified regulators of seed developmental fate within the nongermination domain of SCoPNet. Nodes colored yellow are newly indentified regulators of seed germination, red nodes are classified by the SAM NG gene list (transcriptionally upregulated in nongerminating seeds), and gray nodes are genes whose transcripts are not significantly regulated by germination. Node size corresponds to degree and increasing edge thickness corresponds to increasing confidence for the predicted association based on point-wise mutual information. (B) Transcript abundance of ASG5, ASG6, and ASG7 in the abi3-4 mutant and the corresponding Landsberg erecta control seeds at 24 h after imbibition (Carrera et al., 2008). (C) Transcript abundance of ASG6 and ASG7 in GA-deficient ga1-3 mutant seeds in the absence and presence of exogenously applied GA (Ogawa et al., 2003). (D) eFP output indicating the transcript abundance of ASG6 in the embryo and endosperm of germinated and PAC-inhibited seeds (Penfield et al., 2006; Bassel et al., 2008). (E) Associations between previously identified and newly characterized regulators of seed developmental fate within the germination domain of SCoPNet. ASG8 is a newly identified regulator and colored yellow, SAM G (germination upregulated) genes are colored blue, and gray nodes indicate genes whose transcripts are not significantly regulated by germination. Node size corresponds to degree and increasing edge thickness corresponds to increasing confidence for the predicted association based on pointwise mutual information. (F) eFP output indicating the transcript abundance of ASG8 in the embryo and endosperm of PAC-inhibited and germinated seeds.
Figure 6.
Figure 6.
Expression Patterns of Genes Connected in SCoPNet over a Time Course of Seed Germination. In each case relative transcript abundance during a time course of seed germination is indicated (Nakabayashi et al., 2005). (A) ABA3 and ABI4. (B) RGL3 and EIN3. (C) ASG5 and ASG7. (D) SAD1 and SOMNUS. (E) β-HYDROXYLASE1 and PYL4. (F) ABI3 and PYL9. (G) MYB33 and MYB101. (H) ABI3 and ABI4.
Figure 7.
Figure 7.
Screenshot of the Online Network Query Tool Generated in This Study to Query SCoPNet. The seed germination regulatory gene RGL2 was queried using the gene name in the query box and is highlighted within the network view window. SCoPNet is available at http://www.vseed.nottingham.ac.uk/.

Similar articles

See all similar articles

Cited by 33 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback