Unsupervised multiple kernel learning for heterogeneous data integration
- PMID: 29077792
- DOI: 10.1093/bioinformatics/btx682
Unsupervised multiple kernel learning for heterogeneous data integration
Abstract
Motivation: Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account.
Results: We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system.
Availability and implementation: Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/.
Contact: jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr.
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
Feature selection for kernel methods in systems biology.NAR Genom Bioinform. 2022 Mar 7;4(1):lqac014. doi: 10.1093/nargab/lqac014. eCollection 2022 Mar. NAR Genom Bioinform. 2022. PMID: 35265835 Free PMC article.
-
mixOmics: An R package for 'omics feature selection and multiple data integration.PLoS Comput Biol. 2017 Nov 3;13(11):e1005752. doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov. PLoS Comput Biol. 2017. PMID: 29099853 Free PMC article.
-
integrOmics: an R package to unravel relationships between two omics datasets.Bioinformatics. 2009 Nov 1;25(21):2855-6. doi: 10.1093/bioinformatics/btp515. Epub 2009 Aug 25. Bioinformatics. 2009. PMID: 19706745 Free PMC article.
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
-
Integration of Online Omics-Data Resources for Cancer Research.Front Genet. 2020 Oct 23;11:578345. doi: 10.3389/fgene.2020.578345. eCollection 2020. Front Genet. 2020. PMID: 33193699 Free PMC article. Review.
Cited by
-
MEMMAL: A tool for expanding large-scale mechanistic models with machine learned associations and big datasets.Front Syst Biol. 2023;3:1099413. doi: 10.3389/fsysb.2023.1099413. Epub 2023 Mar 9. Front Syst Biol. 2023. PMID: 38269333 Free PMC article.
-
A toolbox of machine learning software to support microbiome analysis.Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023. Front Microbiol. 2023. PMID: 38075858 Free PMC article. Review.
-
Imaging and multi-omics datasets converge to define different neural progenitor origins for ATRT-SHH subgroups.Nat Commun. 2023 Oct 20;14(1):6669. doi: 10.1038/s41467-023-42371-7. Nat Commun. 2023. PMID: 37863903 Free PMC article.
-
Asterics: a simple tool for the ExploRation and Integration of omiCS data.BMC Bioinformatics. 2023 Oct 18;24(1):391. doi: 10.1186/s12859-023-05504-9. BMC Bioinformatics. 2023. PMID: 37853347 Free PMC article.
-
Improvement of variables interpretability in kernel PCA.BMC Bioinformatics. 2023 Jul 12;24(1):282. doi: 10.1186/s12859-023-05404-y. BMC Bioinformatics. 2023. PMID: 37438763 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
