ukbtools: An R package to manage and query UK Biobank data

PLoS One. 2019 May 31;14(5):e0214311. doi: 10.1371/journal.pone.0214311. eCollection 2019.

Abstract

Introduction: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.

Results: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.

Conclusion: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis
  • Database Management Systems*
  • Datasets as Topic*
  • Disease
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval*
  • Metadata
  • United Kingdom