Issues in searching molecular sequence databases

S F Altschul; M S Boguski; W Gish; J C Wootton

doi:10.1038/ng0294-119

Issues in searching molecular sequence databases

Nat Genet. 1994 Feb;6(2):119-29. doi: 10.1038/ng0294-119.

Authors

S F Altschul¹, M S Boguski, W Gish, J C Wootton

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894.

PMID: 8162065
DOI: 10.1038/ng0294-119

Abstract

Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.

Publication types

Review

MeSH terms

Algorithms
Amino Acid Sequence
Animals
Base Sequence
Databases, Factual*
Humans
Information Storage and Retrieval*
Molecular Sequence Data
Sequence Alignment*
Sequence Homology*
Software