Motivation: The reliable identification of presence or absence of biological agents ("targets"), such as viruses or bacteria, is crucial for many applications from health care to biodiversity. If genomic sequences of targets are known, hybridization reactions between oligonucleotide probes and targets performed on suitable DNA microarrays will allow to infer presence or absence from the observed pattern of hybridization. Targets, for example all known strains of HIV, are often closely related and finding unique probes becomes impossible. The use of non-unique oligonucleotides with more advanced decoding techniques from statistical group testing allows to detect known targets with great success. Of great relevance, however, is the problem of identifying the presence of previously unknown targets or of targets that evolve rapidly.
Results: We present the first approach to decode hybridization experiments using non-unique probes when targets are related by a phylogenetic tree. Using a Bayesian framework and a Markov chain Monte Carlo approach we are able to identify over 94% of known targets and assign up to 70% of unknown targets to their correct clade in hybridization simulations on biological and simulated data.
Availability: Software implementing the method described in this paper and datasets are available from http://algorithmics.molgen.mpg.de/probetrees.