Motivation: Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins, UGA is recoded to Sec in presence of specific features on selenoprotein gene transcripts. Due to the dual role of the UGA codon, selenoprotein prediction and annotation are difficult tasks, and even known selenoproteins are often misannotated in genome databases.
Results: We present an homology-based in silico method to scan genomes for members of the known eukaryotic selenoprotein families: selenoprofiles. The core of the method is a set of manually curated highly reliable multiple sequence alignments of selenoprotein families, which are used as queries to scan genomic sequences. Results of the scan are processed through a number of steps, to produce highly accurate predictions of selenoprotein genes with little or no human intervention. Selenoprofiles is a valuable tool for bioinformatic characterization of eukaryotic selenoproteomes, and can complement genome annotation pipelines.
Availability and implementation: Selenoprofiles is a python-built pipeline that internally runs psitblastn, exonerate, genewise, SECISearch and a number of custom-made scripts and programs. The program is available at http://big.crg.cat/services/selenoprofiles. The predictions presented in this article are available through DAS at http://genome.crg.cat:9000/das/Selenoprofiles_ensembl.