Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):292-8. doi: 10.1136/amiajnl-2013-001847. Epub 2013 Nov 22.


Objective: UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user.

Materials and methods: Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality.

Results: An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results.

Discussion: We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research.

Conclusions: Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases.

Keywords: Data quality; Data visualisation; Electronic Health records; Primary Care.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Databases, Factual*
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Online Systems
  • Patient Selection
  • Quality Control
  • Randomized Controlled Trials as Topic
  • United Kingdom
  • User-Computer Interface*