Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 15;30(16):2343-50.
doi: 10.1093/bioinformatics/btu298. Epub 2014 Apr 25.

Modeling Disease Progression Using Dynamics of Pathway Connectivity

Affiliations
Free PMC article

Modeling Disease Progression Using Dynamics of Pathway Connectivity

Xiaoke Ma et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Disease progression is driven by dynamic changes in both the activity and connectivity of molecular pathways. Understanding these dynamic events is critical for disease prognosis and effective treatment. Compared with activity dynamics, connectivity dynamics is poorly explored.

Results: We describe the M-module algorithm to identify gene modules with common members but varied connectivity across multiple gene co-expression networks (aka M-modules). We introduce a novel metric to capture the connectivity dynamics of an entire M-module. We find that M-modules with dynamic connectivity have distinct topological and biochemical properties compared with static M-modules and hub genes. We demonstrate that incorporation of module connectivity dynamics significantly improves disease stage prediction. We identify different sets of M-modules that are important for specific disease stage transitions and offer new insights into the molecular events underlying disease progression. Besides modeling disease progression, the algorithm and metric introduced here are broadly applicable to modeling dynamics of molecular pathways.

Availability and implementation: M-module is implemented in R. The source code is freely available at http://www.healthcare.uiowa.edu/labs/tan/M-module.zip.

Figures

Fig. 1.
Fig. 1.
Overview of the M-module framework. The algorithm consists of three key components: construction of multiple co-expression networks, seed selection and M-module search. First-order partial Pearson correlation coefficient is used as edge weight to construct the gene co-expression network. For each network, we integrate topological and gene mutation information to rank genes via network propagation. The overall ranking of a gene across multiple networks is computed by considering rankings in all networks. The top genes are used as seeds and a graph-entropy–based function is used to guide the M-module search
Fig. 2.
Fig. 2.
Performance assessment of M-module using simulated and real networks. (A) Performance as a function of the amount of noise in three simulated networks. AUC was used as the performance measure. Shown here are average AUC values of 50 runs of each method at each noise level. (B) Time complexity of different methods. Inputs are four gene co-expression networks constructed using breast cancer data. For M-module, two strategies were used to select seeds: top 5% genes as seeds and top 20% as seeds (in this case, >90% genes were covered by the discovered modules). (C) Specificity of the methods. Gene modules found by each method are evaluated by a set of gold-standard pathway annotations. Specificity is defined as the fraction of predicted modules that significantly overlaps with reference pathways. (D) Sensitivity of the methods. Sensitivity is defined as the fraction of reference pathways that significantly overlaps with predicted modules. Pathway overlap P-values were computed using hypergeometric distribution. P-values for the difference in specificity and sensitivity were computed using Fisher’s exact test. All P-values were corrected for multiple testing using the method of Benjamin–Hochberg. *P < 0.05
Fig. 3.
Fig. 3.
Evidence and properties of module connectivity dynamics across multiple networks. 4-modules were identified using co-expression networks representing four stages of breast cancer. (A) An example dynamic 4-module representing the Erbb2/Her2 signaling pathway. Middle subnetwork, composite 4-modules whose edges are the average co-expression correlations across four networks. Surrounding subnetworks, subnetworks induced by edges that show significant changes in values between two adjacent co-expression networks. (B) Cumulative distributions of connectivity dynamic scores of discovered 4-modules. MCDS, module connectivity dynamic score. (C) Module connectivity dynamics is not correlated with expression level dynamics of module members. Top, correlation between gene expression dynamics and gene connectivity dynamics of module members. Bottom, overlap between 4-module genes and differentially expressed genes. (D) Betweenness centrality of genes in 4-modules and hub genes. (E) Sum of edge weights of genes in 4-modules and hub genes. (F) Occurrence frequency of signaling (left) and non-signaling (right) protein domains encoded by 4-module genes
Fig. 4.
Fig. 4.
Module connectivity dynamics improves disease stage classification. Results are based on 50 independent 5-fold cross validations. (A) Classification accuracy of breast cancer stages using different feature sets, including randomly selected genes (RG, N = 50 features, 50 genes), differentially expressed genes (DG, N = 50 features, 50 genes), TC modules (N = 1573 features, 1601 genes), SC (91 features, 7737 genes), CC (100 features, 7737 genes), Jointclustering (JC, 110 features, 7690 genes), significant 4-modules (SM, 50 features, 635 genes) and weighted 4-modules (wSM, 50 features, 635 genes). Accuracy is defined as the number of patient samples correctly classified. Y-axis, mean accuracy. Error bar, standard deviation. (B) Receiver operating characteristic curves for SVM classifiers trained with different feature sets. AUC values are in parenthesis
Fig. 5.
Fig. 5.
Characteristics of discovered 4-modules. (A) Meta-network view of 4-modules across breast cancer stages. Edge thickness is proportional to the Pearson correlation of the first principle components between the expression profiles of two modules across all patient samples. Node size is proportional to the average connectivity dynamic score of a 4-module over three adjacent stage transitions. Node color, enriched GO biological process terms. (B) Feature importance for cancer stage classification. Each row represents a feature (4-modules) and each column represents a breast cancer stage. Feature importance values are clustered using hierarchical clustering. Feature ID and enriched GO biological process term are shown to the right of the dendragram

Similar articles

See all similar articles

Cited by 13 articles

See all "Cited by" articles

Publication types

Feedback