A Bottom-up Approach to Data Annotation in Neurophysiology

Front Neuroinform. 2011 Aug 30:5:16. doi: 10.3389/fninf.2011.00016. eCollection 2011.


Metadata providing information about the stimulus, data acquisition, and experimental conditions are indispensable for the analysis and management of experimental data within a lab. However, only rarely are metadata available in a structured, comprehensive, and machine-readable form. This poses a severe problem for finding and retrieving data, both in the laboratory and on the various emerging public data bases. Here, we propose a simple format, the "open metaData Markup Language" (odML), for collecting and exchanging metadata in an automated, computer-based fashion. In odML arbitrary metadata information is stored as extended key-value pairs in a hierarchical structure. Central to odML is a clear separation of format and content, i.e., neither keys nor values are defined by the format. This makes odML flexible enough for storing all available metadata instantly without the necessity to submit new keys to an ontology or controlled terminology. Common standard keys can be defined in odML-terminologies for guaranteeing interoperability. We started to define such terminologies for neurophysiological data, but aim at a community driven extension and refinement of the proposed definitions. By customized terminologies that map to these standard terminologies, metadata can be named and organized as required or preferred without softening the standard. Together with the respective libraries provided for common programming languages, the odML format can be integrated into the laboratory workflow, facilitating automated collection of metadata information where it becomes available. The flexibility of odML also encourages a community driven collection and definition of terms used for annotating data in the neurosciences.

Keywords: datamodel; datasharing; metadata; neuroscience; ontology.