Principles of metadata organization at the ENCODE data coordination center

Database (Oxford). 2016 Mar 15;2016:baw001. doi: 10.1093/database/baw001. Print 2016.

Abstract

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Animals
  • Caenorhabditis elegans
  • Computational Biology / methods*
  • Computational Biology / standards
  • DNA / genetics*
  • Data Collection
  • Databases, Genetic*
  • Drosophila melanogaster
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mice
  • Nucleic Acids / genetics
  • Quality Control
  • Reproducibility of Results
  • Sequence Alignment

Substances

  • Nucleic Acids
  • DNA