Motivation: Integral membrane proteins play important roles in living cells. Although these proteins are estimated to constitute 25% of proteins at a genomic scale, the Protein Data Bank (PDB) contains only a few hundred membrane proteins due to the difficulties with experimental techniques. The presence of transmembrane proteins in the structure data bank, however, is quite invisible, as the annotation of these entries is rather poor. Even if a protein is identified as a transmembrane one, the possible location of the lipid bilayer is not indicated in the PDB because these proteins are crystallized without their natural lipid bilayer, and currently no method is publicly available to detect the possible membrane plane using the atomic coordinates of membrane proteins.
Results: Here, we present a new geometrical approach to distinguish between transmembrane and globular proteins using structural information only and to locate the most likely position of the lipid bilayer. An automated algorithm (TMDET) is given to determine the membrane planes relative to the position of atomic coordinates, together with a discrimination function which is able to separate transmembrane and globular proteins even in cases of low resolution or incomplete structures such as fragments or parts of large multi chain complexes. This method can be used for the proper annotation of protein structures containing transmembrane segments and paves the way to an up-to-date database containing the structure of all known transmembrane proteins and fragments (PDB_TM) which can be automatically updated. The algorithm is equally important for the purpose of constructing databases purely of globular proteins.