A novel technique to annotate, query, and analyze chemical compounds has been developed and is illustrated by using the inhibitor data on HIV protease-inhibitor complexes. In this method, all chemical compounds are annotated in terms of standard chemical structural fragments. These standard fragments are defined by using criteria, such as chemical classification; structural, chemical, or functional groups; and commercial, scientific or common names or synonyms. These fragments are then organized into a data tree based on their chemical substructures. Search engines have been developed to use this data tree to enable query on inhibitors of HIV protease (http://xpdb.nist.gov/hivsdb/hivsdb.html). These search engines use a new novel technique, Chemical Block Layered Alignment of Substructure Technique (Chem-BLAST) to search on the fragments of an inhibitor to look for its chemical structural neighbors. This novel technique to annotate and query compounds lays the foundation for the use of the Semantic Web concept on chemical compounds to allow end users to group, sort, and search structural neighbors accurately and efficiently. During annotation, it enables the attachment of "meaning" (i.e., semantics) to data in a manner that far exceeds the current practice of associating "metadata" with data by creating a knowledge base (or ontology) associated with compounds. Intended users of the technique are the research community and pharmaceutical industry, for which it will provide a new tool to better identify novel chemical structural neighbors to aid drug discovery.
2006 Wiley-Liss, Inc.