Motivation: Pooling multiple samples increases the efficiency and lowers the cost of DNA sequencing. One approach to multiplexing is to use short DNA indices to uniquely identify each sample. After sequencing, reads must be assigned in silico to the sample of origin, a process referred to as demultiplexing. Demultiplexing software typically identifies the sample of origin using a fixed number of mismatches between the read index and a reference index set. This approach may fail or misassign reads when the sequencing quality of the indices is poor.
Results: We introduce deML, a maximum likelihood algorithm that demultiplexes Illumina sequences. deML computes the likelihood of an observed index sequence being derived from a specified sample. A quality score which reflects the probability of the assignment being correct is generated for each read. Using these quality scores, even very problematic datasets can be demultiplexed and an error threshold can be set.
Availability and implementation: deML is freely available for use under the GPL (http://bioinf.eva.mpg.de/deml/).
© The Author 2014. Published by Oxford University Press.