Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data

Nucleic Acids Res. 2021 Sep 27;49(17):e98. doi: 10.1093/nar/gkab552.

Abstract

As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a 'taxonomy-like' structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other '-omics', data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / pathology
  • Cluster Analysis*
  • Computational Biology / methods*
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Genomics / methods
  • Humans
  • Lymphocytes, Tumor-Infiltrating / classification
  • Lymphocytes, Tumor-Infiltrating / metabolism
  • Prognosis
  • Reproducibility of Results
  • Single-Cell Analysis / methods*
  • Survival Analysis
  • T-Lymphocyte Subsets / classification
  • T-Lymphocyte Subsets / metabolism