Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 430 (18 Pt A), 2924-2938

Patient Similarity Networks for Precision Medicine


Patient Similarity Networks for Precision Medicine

Shraddha Pai et al. J Mol Biol.


Clinical research and practice in the 21st century is poised to be transformed by analysis of computable electronic medical records and population-level genome-scale patient profiles. Genomic data capture genetic and environmental state, providing information on heterogeneity in disease and treatment outcome, but genomic-based clinical risk scores are limited. Achieving the goal of routine precision medicine that takes advantage of these rich genomics data will require computational methods that support heterogeneous data, have excellent predictive performance, and ideally, provide biologically interpretable results. Traditional machine-learning approaches excel at performance, but often have limited interpretability. Patient similarity networks are an emerging paradigm for precision medicine, in which patients are clustered or classified based on their similarities in various features, including genomic profiles. This strategy is analogous to standard medical diagnosis, has excellent performance, is interpretable, and can preserve patient privacy. We review new methods based on patient similarity networks, including Similarity Network Fusion for patient clustering and netDx for patient classification. While these methods are already useful, much work is required to improve their scalability for contemporary genetic cohorts, optimize parameters, and incorporate a wide range of genomics and clinical data. The coming 5 years will provide an opportunity to assess the utility of network-based algorithms for precision medicine.

Keywords: genomics; machine learning; networks; patient classifier; precision medicine.


Figure 1
Figure 1. Contemporary risk calculators and their development process
A. Examples of risk models in current clinical use (rows) and the patient data required for each (columns). See Box 1 for details. B. Process for risk model development. The first model is developed by testing performance of a variety of models on subsets of the training data (internal validation). Following successful internal validation, model generalizability is then assessed by external validation on similar populations. Generalizability is also tested on similar populations with specific differences (e.g. geographic origin). This step would identify whether it is possible to develop a general model for multiple populations or whether subpopulation-specific models are needed. A well-validated model is recommended in professional clinical practice guidelines, but a clinician may choose to adopt a sufficiently validated model earlier in this process. This process is iterative and refinements continue to be made on decades-long models in clinical use.
Figure 2
Figure 2. Genomics in clinical risk models
A. Vision of genomic analyses as part of a process for clinical decision-making. The outer ring tracks patient interactions with the healthcare system in a future genomic era of medicine. Clinical and genomic assessment generates patient data, whereupon physicians diagnose patients, prescribe therapy and counsel about prevention based on disease risk. Patients iterate this process with follow-up visits. The field of computational biology will catalyze precision medicine by developing tools that help generate patient classification, diagnosis and prognosis, and guide therapy and prevention. B. Current and projected ‘omic cohorts for precision medicine. The x-axis shows the year of the publication or update; values at 2020 are projected by the authors based on public information. Y-axis shows the sample size on which the project was or is projected to run (powers of 10). ,– (IBD:; Unlabelled points are for: 1., 2. Blood lipids GWAS; 3. Glioma; 4. Type 2 diabetes microbiome:; 5. Breast cancer 6 - Cholangiocarcoma. MVP: Million Veterans Program.
Figure 3
Figure 3. Patient similarity networks for hypothetical example of predicting lung cancer risk
Nodes are patients and edge weights reflect datatype similarity. This example shows similarity from clinical (red), gene expression (green) and metabolomics (blue) data. Here, cases and controls form separate densely connected parts of the network based on clinical data (red; e.g. smoking frequency), and a similar clique in metabolomics data (blue). The predictor would therefore select clinical data and metabolomic data as predictive of case status.
Figure 4
Figure 4. Predicting ependymoma subtype with netDx
A. ROC curve showing performance over 10 train/test splits (grey) and the average (blue). B. Pathway-level scores for Group A tumours. Nodes show pathway-level features that scored 10/10 in >= 7 out of 10 trials; edges connect pathways with shared genes. AutoAnnotate was used to cluster pathways., C. Integrated patient similarity network following feature selection. Nodes show the two types of tumours. Edges show patient similarity for pathways scoring 10/10 in all splits for either class. For visualization, the top 90% edges were included; edge-weighted spring-embedded layout was used to lay out the network in Cytoscape.
Figure 5
Figure 5. Vision for a network-based classification tool for precision medicine
A. User interface for a network-based patient classifier software tool, such as netDx, in the near future. Such a system could be integrated with a research hospital Electronic Medical Record system and in-house genomics database. A clinical researcher could use this to build a predictor by selecting data of interest and predictor options. B. User interface for visualizing predictor results, represented as multiple tabs. Here, the active tab shows a hypothetical integrated patient similarity network. The user has interactively highlighted a single patient for detailed study (red node) as shown in the right panel.

Similar articles

See all similar articles

Cited by 5 articles

Publication types

LinkOut - more resources