Motivation: DNA arrays are a very useful tool to quickly identify biological agents present in some given sample, e.g. to identify viruses causing disease, for quality control in the food industry, or to determine bacteria contaminating drinking water. The selection of specific oligos to attach to the array surface is a relevant problem in the experiment design process. Given a set S of genomic sequences (the target sequences), the task is to find at least one oligonucleotide, called probe, for each sequence in S. This probe will be attached to the array surface, and must be chosen in a way that it will not hybridize to any other sequence but the intended target. Furthermore, all probes on the array must hybridize to their intended targets under the same reaction conditions, most importantly at the temperature T at which the experiment is conducted.
Results: We present an efficient algorithm for the probe design problem. Melting temperatures are calculated for all possible probe-target interactions using an extended nearest-neighbor model, allowing for both non-Watson-Crick base-pairing and unpaired bases within a duplex. To compute temperatures efficiently, a combination of suffix trees and dynamic programming based alignment algorithms is introduced. Additional filtering steps during preprocessing increase the speed of the computation. The practicability of the algorithms is demonstrated by two case studies: The identification of HIV-1 subtypes, and of 28S rDNA sequences from >or=400 organisms.