Evaluation of relational and NoSQL database architectures to manage genomic annotations

J Biomed Inform. 2016 Dec;64:288-295. doi: 10.1016/j.jbi.2016.10.015. Epub 2016 Oct 31.

Abstract

While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences.

Keywords: Genomics; MongoDB; MySQL; NoSQL; PostgreSQL; Relational database.

MeSH terms

  • Database Management Systems*
  • Databases, Genetic
  • Genomics*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Information Storage and Retrieval*