Annotation and cross-indexing of array elements on multiple platforms

Environ Health Perspect. 2004 Mar;112(4):506-10. doi: 10.1289/ehp.6698.

Abstract

On the surface, transcript profiling using microarrays seems to offer a way of looking at the global response of the cell to perturbation, with a focus on changes in gene expression. The difficulty, however, is that the response of a particular gene is actually measured on the array by an element that is a short, defined nucleic acid sequence. Sequences that map back to the same genetic locus may actually be given different names and descriptions when they are deposited in public sequence databases; when such sequences are used in microarray construction, elements that monitor the same genetic locus may have different names and descriptions. The algorithm described here uses a hierarchical approach to assign a single best annotation to the elements in a given microarray in such a fashion that elements from one microarray platform may be cross-indexed with those of another. The algorithm relies on the nucleic acid accession number for a given array element, and uses that to retrieve annotation from the most recent versions of LocusLink and UniGene. Both database resources are searched, with a priority being given to annotation derived from the curated LocusLink database. In lieu of annotation found in these databases, the default GenBank annotation is used. As a final outcome, a cross-chip identifier is generated that may be used to cross-index array elements. The program is available as a practical extraction and report language (Perl) script that can run under any Perl interpreter.

MeSH terms

  • Algorithms*
  • Animals
  • Databases, Nucleic Acid*
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Information Storage and Retrieval*
  • Nucleic Acids
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*

Substances

  • Nucleic Acids