Dealing with the Data Deluge: Handling the Multitude Of Chemical Biology Data Sources

Curr Protoc Chem Biol. 2012 Sep:4:193-209. doi: 10.1002/9780470559277.ch110262. Epub 2012 Sep 1.

Abstract

Over the last 20 years, there has been an explosion in the amount and type of biological and chemical data that has been made publicly available in a variety of online databases. While this means that vast amounts of information can be found online, there is no guarantee that it can be found easily (or at all). A scientist searching for a specific piece of information is faced with a daunting task - many databases have overlapping content, use their own identifiers and, in some cases, have arcane and unintuitive user interfaces. In this overview, a variety of well known data sources for chemical and biological information are highlighted, focusing on those most useful for chemical biology research. The issue of using multiple data sources together and the associated problems such as identifier disambiguation are highlighted. A brief discussion is then provided on Tripod, a recently developed platform that supports the integration of arbitrary data sources, providing users a simple interface to search across a federated collection of resources.