A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits

Bioinformatics. 2020 Feb 1;36(3):842-850. doi: 10.1093/bioinformatics/btz667.


Motivation: Epidemiologic, clinical and translational studies are increasingly generating multiplatform omics data. Methods that can integrate across multiple high-dimensional data types while accounting for differential patterns are critical for uncovering novel associations and underlying relevant subgroups.

Results: We propose an integrative model to estimate latent unknown clusters (LUCID) aiming to both distinguish unique genomic, exposure and informative biomarkers/omic effects while jointly estimating subgroups relevant to the outcome of interest. Simulation studies indicate that we can obtain consistent estimates reflective of the true simulated values, accurately estimate subgroups and recapitulate subgroup-specific effects. We also demonstrate the use of the integrated model for future prediction of risk subgroups and phenotypes. We apply this approach to two real data applications to highlight the integration of genomic, exposure and metabolomic data.

Availability and implementation: The LUCID method is implemented through the LUCIDus R package available on CRAN (https://CRAN.R-project.org/package=LUCIDus).

Supplementary information: Supplementary materials are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biomarkers
  • Cluster Analysis
  • Genomics*
  • Phenotype
  • Software*


  • Biomarkers