Background: The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank.
Results: The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe.
Conclusion: The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.