Multi-omics data integration considerations and study design for biological systems and disease

Stefan Graw; Kevin Chappell; Charity L Washam; Allen Gies; Jordan Bird; Michael S Robeson 2nd; Stephanie D Byrum

doi:10.1039/d0mo00041h

Multi-omics data integration considerations and study design for biological systems and disease

Mol Omics. 2021 Apr 19;17(2):170-185. doi: 10.1039/d0mo00041h.

Authors

Stefan Graw¹, Kevin Chappell¹, Charity L Washam², Allen Gies¹, Jordan Bird¹, Michael S Robeson 2nd³, Stephanie D Byrum²

Affiliations

¹ Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, 4301 West Markham Street (slot 516), Little Rock, AR 72205-7199, USA. sbyrum@uams.edu.
² Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, 4301 West Markham Street (slot 516), Little Rock, AR 72205-7199, USA. sbyrum@uams.edu and Arkansas Children's Research Institute, 13 Children's Way, Little Rock, AR 72202, USA.
³ Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA. mrobeson@uams.edu.

Abstract

With the advancement of next-generation sequencing and mass spectrometry, there is a growing need for the ability to merge biological features in order to study a system as a whole. Features such as the transcriptome, methylome, proteome, histone post-translational modifications and the microbiome all influence the host response to various diseases and cancers. Each of these platforms have technological limitations due to sample preparation steps, amount of material needed for sequencing, and sequencing depth requirements. These features provide a snapshot of one level of regulation in a system. The obvious next step is to integrate this information and learn how genes, proteins, and/or epigenetic factors influence the phenotype of a disease in context of the system. In recent years, there has been a push for the development of data integration methods. Each method specifically integrates a subset of omics data using approaches such as conceptual integration, statistical integration, model-based integration, networks, and pathway data integration. In this review, we discuss considerations of the study design for each data feature, the limitations in gene and protein abundance and their rate of expression, the current data integration methods, and microbiome influences on gene and protein expression. The considerations discussed in this review should be regarded when developing new algorithms for integrating multi-omics data.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Review

MeSH terms

Algorithms
Epigenomics
Genomics*
High-Throughput Nucleotide Sequencing
Humans
Proteome / genetics*
Proteomics*
Transcriptome / genetics*

Substances

Proteome

Grants and funding

P20 GM121293/GM/NIGMS NIH HHS/United States