Analysis of the relationships among Longest Common Subsequences, Shortest Common Supersequences and patterns and its application on pattern discovery in biological sequences

Int J Data Min Bioinform. 2011;5(6):611-25. doi: 10.1504/ijdmb.2011.045413.

Abstract

For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences' profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Base Sequence
  • Conserved Sequence
  • Molecular Sequence Data
  • Sequence Alignment / methods*