Biomedical Data Commons (BMDC) prioritizes B-lymphocyte non-coding genetic variants in Type 1 Diabetes

PLoS Comput Biol. 2021 Sep 20;17(9):e1009382. doi: 10.1371/journal.pcbi.1009382. eCollection 2021 Sep.

Abstract

The repurposing of biomedical data is inhibited by its fragmented and multi-formatted nature that requires redundant investment of time and resources by data scientists. This is particularly true for Type 1 Diabetes (T1D), one of the most intensely studied common childhood diseases. Intense investigation of the contribution of pancreatic β-islet and T-lymphocytes in T1D has been made. However, genetic contributions from B-lymphocytes, which are known to play a role in a subset of T1D patients, remain relatively understudied. We have addressed this issue through the creation of Biomedical Data Commons (BMDC), a knowledge graph that integrates data from multiple sources into a single queryable format. This increases the speed of analysis by multiple orders of magnitude. We develop a pipeline using B-lymphocyte multi-dimensional epigenome and connectome data and deploy BMDC to assess genetic variants in the context of Type 1 Diabetes (T1D). Pipeline-identified variants are primarily common, non-coding, poorly conserved, and are of unknown clinical significance. While variants and their chromatin connectivity are cell-type specific, they are associated with well-studied disease genes in T-lymphocytes. Candidates include established variants in the HLA-DQB1 and HLA-DRB1 and IL2RA loci that have previously been demonstrated to protect against T1D in humans and mice providing validation for this method. Others are included in the well-established T1D GRS2 genetic risk scoring method. More intriguingly, other prioritized variants are completely novel and form the basis for future mechanistic and clinical validation studies The BMDC community-based platform can be expanded and repurposed to increase the accessibility, reproducibility, and productivity of biomedical information for diverse applications including the prioritization of cell type-specific disease alleles from complex phenotypes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • B-Lymphocytes / immunology*
  • Child
  • Computational Biology
  • Databases, Genetic / statistics & numerical data
  • Diabetes Mellitus, Type 1 / genetics*
  • Diabetes Mellitus, Type 1 / immunology*
  • Gene Regulatory Networks
  • Genetic Predisposition to Disease
  • Genetic Variation
  • Genome-Wide Association Study / statistics & numerical data
  • HLA-DQ beta-Chains / genetics
  • HLA-DRB1 Chains / genetics
  • Humans
  • Ikaros Transcription Factor / genetics
  • Interleukin-2 Receptor alpha Subunit / genetics
  • Mice
  • Polymorphism, Single Nucleotide
  • RNA, Untranslated / genetics

Substances

  • HLA-DQ beta-Chains
  • HLA-DQB1 antigen
  • HLA-DRB1 Chains
  • IKZF1 protein, human
  • IL2RA protein, human
  • Interleukin-2 Receptor alpha Subunit
  • RNA, Untranslated
  • Ikaros Transcription Factor