The computational analysis of enzymes that participate in lipid metabolism has both common and unique challenges when compared to the whole protein universe. Some of the hurdles that interfere with the functional annotation of lipid metabolic enzymes that are common to other pathways include the definition of proper starting datasets, the construction of reliable multiple sequence alignments, the definition of appropriate evolutionary models, and the reconstruction of phylogenetic trees with high statistical support, particularly for large datasets. Most enzymes that take part in lipid metabolism belong to complex superfamilies with many members that are not involved in lipid metabolism. In addition, some enzymes that do not have sequence similarity catalyze similar or even identical reactions. Some of the challenges that, albeit not unique, are more specific to lipid metabolism refer to the high compartmentalization of the routes, the catalysis in hydrophobic environments and, related to this, the function near or in biological membranes.In this work, we provide guidelines intended to assist in the proper functional annotation of lipid metabolic enzymes, based on previous experiences related to the phospholipase D superfamily and the annotation of the triglyceride synthesis pathway in algae. We describe a pipeline that starts with the definition of an initial set of sequences to be used in similarity-based searches and ends in the reconstruction of phylogenies. We also mention the main issues that have to be taken into consideration when using tools to analyze subcellular localization, hydrophobicity patterns, or presence of transmembrane domains in lipid metabolic enzymes.
Keywords: Data-mining; Functional annotation; Hydrophobicity; Multiple sequence alignment; Phylogenetic tree; Superfamily.