Protein sectors: statistical coupling analysis versus conservation

Tiberiu Teşileanu; Lucy J Colwell; Stanislas Leibler

doi:10.1371/journal.pcbi.1004091

Protein sectors: statistical coupling analysis versus conservation

PLoS Comput Biol. 2015 Feb 27;11(2):e1004091. doi: 10.1371/journal.pcbi.1004091. eCollection 2015 Feb.

Authors

Tiberiu Teşileanu¹, Lucy J Colwell², Stanislas Leibler³

Affiliations

¹ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Initiative for the Theoretical Sciences, CUNY Graduate Center, 365 Fifth Avenue, New York, New York, United States of America.
² The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom.
³ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Center for Studies in Physics and Biology and Laboratory of Living Matter, The Rockefeller University, 1230 York Avenue, New York, New York, United States of America.

Abstract

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Computational Biology / methods*
Conserved Sequence
PDZ Domains
Protein Interaction Domains and Motifs / physiology*
Proteins / chemistry*
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Tetrahydrofolate Dehydrogenase

Substances

Proteins
Tetrahydrofolate Dehydrogenase

Grants and funding

TT was supported by a Charles L. Brown Membership at the Institute for Advanced Study. LJC was supported by an Engineering and Physical Sciences Research Council Fellowship (EP/H028064/2). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.