Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm's predictive ability. Only a small number of published studies performed a "real" integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.
Keywords: RNA expression; challenges; clinical data; data integration; epidemiological data; genomics; integrative analytics; joint modeling; non-omics data; omics data.