A finite state machine algorithm for finding restriction sites and other pattern matching applications

R Smith

doi:10.1093/bioinformatics/4.4.459

A finite state machine algorithm for finding restriction sites and other pattern matching applications

Comput Appl Biosci. 1988 Nov;4(4):459-65. doi: 10.1093/bioinformatics/4.4.459.

Author

R Smith¹

Affiliation

¹ Public Health Research Institute, New York, NY 10016.

PMID: 3208180
DOI: 10.1093/bioinformatics/4.4.459

Abstract

Existing algorithms for finding restriction endonuclease recognition sites use brute-force algorithms which run in time 0(NM) where N is the number of nucleotides in the sequence under analysis and M is the total number of nucleotides in all the different sites being searched for. This paper presents a deterministic finite state machine algorithm which runs in time 0(N). Memory use can be as high as 0(M4) but a slight modification to the basic algorithm can impose a theoretical upper bound of 0(M) at the cost of some added complexity in the execution of the state machine. The algorithm can operate with a single pass through the sequence under analysis, with no need to back up or (for non-circular sequences) store more than a single input character at a time. This type of algorithm can be adapted to many pattern-matching tasks and is simple enough to implement in hardware that it could, for example, be built into a disk controller as part of a specialized database machine.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Base Sequence
DNA
Pattern Recognition, Automated
Restriction Mapping*
Software

Substances

DNA

Grants and funding

RR-02990/RR/NCRR NIH HHS/United States