Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings

Nat Commun. 2019 Jul 10;10(1):3045. doi: 10.1038/s41467-019-11069-0.


In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research / statistics & numerical data
  • Data Analysis*
  • Data Collection / methods*
  • Electronic Health Records / statistics & numerical data
  • Precision Medicine / methods
  • Unsupervised Machine Learning*