Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation

Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.

Abstract

Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-the-art methods.

Keywords: Frequency profile; Protein remote homology; Pseudo amino acid composition; Support Vector Machine.