Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 15 (1), 66

Charged Residues Next to Transmembrane Regions Revisited: "Positive-inside Rule" Is Complemented by the "Negative Inside Depletion/Outside Enrichment Rule"

Affiliations

Charged Residues Next to Transmembrane Regions Revisited: "Positive-inside Rule" Is Complemented by the "Negative Inside Depletion/Outside Enrichment Rule"

James Alexander Baker et al. BMC Biol.

Abstract

Background: Transmembrane helices (TMHs) frequently occur amongst protein architectures as means for proteins to attach to or embed into biological membranes. Physical constraints such as the membrane's hydrophobicity and electrostatic potential apply uniform requirements to TMHs and their flanking regions; consequently, they are mirrored in their sequence patterns (in addition to TMHs being a span of generally hydrophobic residues) on top of variations enforced by the specific protein's biological functions.

Results: With statistics derived from a large body of protein sequences, we demonstrate that, in addition to the positive charge preference at the cytoplasmic inside (positive-inside rule), negatively charged residues preferentially occur or are even enriched at the non-cytoplasmic flank or, at least, they are suppressed at the cytoplasmic flank (negative-not-inside/negative-outside (NNI/NO) rule). As negative residues are generally rare within or near TMHs, the statistical significance is sensitive with regard to details of TMH alignment and residue frequency normalisation and also to dataset size; therefore, this trend was obscured in previous work. We observe variations amongst taxa as well as for organelles along the secretory pathway. The effect is most pronounced for TMHs from single-pass transmembrane (bitopic) proteins compared to those with multiple TMHs (polytopic proteins) and especially for the class of simple TMHs that evolved for the sole role as membrane anchors.

Conclusions: The charged-residue flank bias is only one of the TMH sequence features with a role in the anchorage mechanisms, others apparently being the leucine intra-helix propensity skew towards the cytoplasmic side, tryptophan flanking as well as the cysteine and tyrosine inside preference. These observations will stimulate new prediction methods for TMHs and protein topology from a sequence as well as new engineering designs for artificial membrane proteins.

Keywords: Amino acid distribution; Genome-wide statistical study; Membrane protein; Negative-not-inside/negative-outside rule; Protein topology prediction; Proteomics; Transmembrane helix; Transmembrane region prediction.

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors agree with the publication of this article.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Negatively charged amino acids are amongst the rarest residues in TMHs and ±5 flanking residues. Bar charts of the abundance of each amino acid type in the TMHs with flank lengths of the accompanying ±5 residues from the (a) UniHuman single-pass proteins, (b) ExpAll single-pass proteins, (c) UniHuman multi-pass proteins, and (d) ExpAll multi-pass proteins. Amino acid types on the horizontal axis are listed in descending count. The bars were coloured according to categorisations of hydrophobic, neutral and hydrophilic types according to the free energy of insertion biological scale [36]. Grey represents hydrophilic amino acids that were found to have a positive ΔG app, and blue represents hydrophobic residues with a negative ΔG app, purple denotes negative residues and positive residues are coloured in orange. The abundances of key residues are labelled
Fig. 2
Fig. 2
Relative percentage normalisation reveals a negative-outside bias in TMHs from single-pass protein datasets. All flank sizes were set at up to ±20 residues. We acknowledge that all values, besides the averaged values, are discrete, and connecting lines are illustrative only. On the horizontal axes (ad) are the distances in residues from the centre of the TMH, with the negative numbers extending towards the cytoplasmic space. For e and f, the horizontal axis represents the residue count from the membrane boundary with negative counts into the cytoplasmic space. Leucine, the most abundant non-polar residue in TMHs, is in blue. Arginine and lysine are shown in dark and light orange respectively. Aspartic and glutamic acid are showing in dark and light purple respectively. a and b On the vertical axis is the absolute abundance of residues in TMHs from single-pass proteins from (a) UniHuman and (b) ExpAll. Note that no clear trend can be seen in the negative residue distribution compared to the positive-inside signal and the leucine abundance throughout the TMH. c and d On the vertical axis is the relative percentage at each position for TMHs from single-pass proteins from (c) UniHuman and (d) ExpAll. The dashed lines show the estimation of the background level of residues with respect to the colour; an average of the relative percentage values between positions 25 to 30 and –30 to –25. The thick bars show the averages on the inner (positions –20 to –10) and outer (positions 10 to 20) flanks coloured to the respective amino acid type. Note a visible suppression of acidic residues on the inside flank when compared to the outside flank in single-pass proteins when normalising according to the relative percentage. e and f The relative distribution of flanks defined by the databases with the distance from the TMH boundary on the horizontal axis. The inside and outside flanks are shown in separate subplots. The colouring is the same as in a and b
Fig. 3
Fig. 3
Negative-outside bias is very subtle in TMHs from multi-pass proteins. The meaning for the horizontal axis is the same as in Fig. 2, with the negative sequence position numbers extending towards the cytoplasmic space. Leucine is in blue. Arginine and lysine are shown in dark and light orange respectively. Aspartic and glutamic acid are shown in dark and light purple respectively. All flank sizes were set at up to ±20 residues. a and b On the vertical axes are the absolute abundances of residues from TMHs of multi-pass proteins from (a) UniHuman and (b) ExpAll. c and d On the vertical axes are the relative percentages at each position for TMHs from multi-pass proteins from (c) UniHuman and (d) ExpAll. As in Fig. 2c and d, the dashed lines show the estimation of the background level of residues with respect to the colour, and the thick bars show the averages on the inner and outer flanks coloured to the respective amino acid type. e and f The relative distribution of flanks defined by the databases with the distance from the TMH boundary on the horizontal axis for both the inside and outside flanks. The colouring is the same as in a and b
Fig. 4
Fig. 4
Relative percentage heatmaps from predictive and experimental datasets corroborate residue distribution differences between TMHs from single-pass and multi-pass proteins. The residue position aligned to the centre of the TMH is on the horizontal axis, and the residue type is on the vertical axis. Amino acid types are listed in order of decreasing hydrophobicity according to the Kyte and Doolittle scale [52]. The flank lengths in the TMH segments were restricted to up to ±10 residues. The scales for each heatmap are shown beneath the respective subfigure. The darkest blue represents 0% distribution, whilst the darkest red represents the maximum relative percentage distribution that is denoted by the keys in each subfigure, with white being 50% between “cold” and “hot”. The central TMH subplots extend from the central TMH residue, whereas the inner and outer flank subplots use the database-defined TMH boundary and extend from that position. a TMHs from the single-pass UniHuman dataset. b Single-pass protein TMHs from the ExpAll dataset. c TMHs from the proteins of the multi-pass UniHuman dataset. d TMHs from ExpAll multi-pass proteins. The general consistency in relative distributions of every residue type between single-pass and multi-pass of either dataset including flank/TMH boundary selection allows us to infer biological conclusions from these distributions that are independent of methodological biases used to gather the sequences. The only residue that displays drastically differently between the datasets is cysteine in multi-pass TMHs only. The most striking differences in distributions between residues from TMHs of single-pass and multi-pass proteins include a more defined Y and W clustering at the flanks, a suppression of E and D on the inside flank, a suppression of P on the inside flank and a topological bias for C favouring the inside flank
Fig. 5
Fig. 5
There is a difference in the hydrophobic profiles of TMHs from single-pass and multi-pass proteins. a The hydrophobicity of single-pass TMHs compared to multi-pass segments from the UniHuman dataset. The Kyte and Doolittle scale of hydrophobicity [52] was used with a window length of 3 to compare TMHs from proteins with different numbers of TMHs. This scale is based on the water-vapour transfer of free energy and the interior-exterior distribution of individual amino acids. The same datasets also had different scales applied (Additional file 2: Figure S2). The vertical axis is the hydrophobicity score, whilst the horizontal axis is the position of the residue relative to the centre of the TMH, with negative values extending into the cytoplasm. In black are the average hydrophobicity values of TMHs belonging to single-pass TMHs, whilst in other colours are the average hydrophobicity values of TMHs belonging to multi-pass proteins containing the same numbers of TMHs per protein. In purple are the TMHs from proteins with more than 15 TMHs per protein that do not share a typical multi-pass profile, perhaps due to their exceptional nature. b The Kruskal-Wallis test (H statistic) was used to compare single-pass windowed hydrophobicity values with the average windowed hydrophobicity value of every TMH from multi-pass proteins at the same position. The vertical axis is the logarithmic scale of the resultant P values. We can much more readily reject the hypothesis that hydrophobicity is the same between TMHs from single-pass and multi-pass proteins in the core of the helix and the flanks than the interfacial regions, particularly at the inner leaflet due to leucine asymmetry (Table 4)
Fig. 6
Fig. 6
Comparing charged amino acid distributions in TMHs of multi-pass and single-pass proteins across different species and organelles. The relative percentage distribution of charged residues and leucine was calculated at each position in the TMH with flank lengths of ±20 in different datasets. The distributions are normalised according to relative percentage distribution. Aspartic acid and glutamic acid are shown in dark purple and light purple respectively. Leucine, the most abundant non-polar residue in TMHs, is in blue. Arginine and lysine are shown in orange. TMHs from single-pass proteins are on the left and TMHs from multi-pass proteins are on the right for different taxonomic datasets: a UniCress, b UniFungi, c UniEcoli, d UniBacilli, e UniArch, and different organelles: f UniER, g UniGolgi, h UniPM. As a trend, the negative-outside skew is more present in TMHs from single-pass proteins than multi-pass proteins (Tables 2 and 3). Another key observation is that in single-pass TMHs there is a propensity for leucine on the inner over the outer leaflet (Table 4)
Fig. 7
Fig. 7
Comparing the amino acid relative percentage distributions of simple and complex TMHs from single-pass proteins and TMHs from multi-pass proteins. TMSOC was used to calculate which single-pass TMHs were complex and which were simple from ExpAll and UniHuman datasets. Simple TMHs are typically anchors without necessarily having other functions (Wong et al. [5]). The relative percentages from single-pass simple (shown in light blue), single-pass complex (red), and multi-pass protein TMHs (black) were plotted for (a, c, e, g, i and k) UniHuman and (b, d, f, h, j and l) ExpAll for (a and b) positive residues, (c and d) negative residues, (e and f) tyrosine, (g and h) tryptophan, (i and j) leucine and (k and l) cysteine. The slopes are statistically compared in Tables 5 and 6, and as a trend, the profiles of complex TMHs are more similar to multi-pass TMH profiles than simple TMHs are to multi-pass TMHs
Fig. 8
Fig. 8
Residue distributions of transmembrane anchors. A view showing additional residue distribution features that TMHs with an anchorage function display. a The more classic model of a TMH showing the "positive-inside" rule [12], the hydrophobic core [52], the polar enrichment that flanks the hydrophobic stretch [13] and the aromatic belt [14]. b Simple anchors may display additional features that conform to the membrane biophysical constraints: further suppression of charge in the hydrophobic core (Table 1), intra-membrane leucine asymmetry that likely causes hydrophobic skew [9] (Table 4, Fig. 5), a higher preference for cysteine on the inside flanking region (Fig. 7k and l), a higher net "positive-inside" charge (Additional file 1: Figure S1), asymmetric skew of the hydrophobic belt favouring the inner leaflet interface (Fig. 7e, f, g and h) and a negative-outside bias via suppression on the inside flanking region or enrichment on the outside flanking region (Fig. 7c and d, Tables 2 and 3)

Similar articles

See all similar articles

Cited by 9 PubMed Central articles

See all "Cited by" articles

References

    1. Elofsson A, von Heijne G. Membrane protein structure: prediction versus reality. Annu Rev Biochem. 2007;76:125–40. doi: 10.1146/annurev.biochem.76.052705.163539. - DOI - PubMed
    1. von Heijne G. Membrane-protein topology. Nat Rev Mol Cell Biol. 2006;7:909–18. doi: 10.1038/nrm2063. - DOI - PubMed
    1. Cymer F, von Heijne G, White SH. Mechanisms of integral membrane protein insertion and folding. J Mol Biol. 2015;427:999–1022. doi: 10.1016/j.jmb.2014.09.014. - DOI - PMC - PubMed
    1. Hessa T, Sharma A, Mariappan M, Eshleman HD, Gutierrez E, Hegde RS. Protein targeting and degradation are coupled for elimination of mislocalized proteins. Nature. 2011;475:394–7. doi: 10.1038/nature10181. - DOI - PMC - PubMed
    1. Wong WC, Maurer-Stroh S, Eisenhaber F. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol. 2010;6:e1000867. doi: 10.1371/journal.pcbi.1000867. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources

Feedback