Motivation: Most of the protein tyrosine kinases found in bacteria have been recently classified in a new family, termed BY-kinase. Indeed, they share no sequence homology with their eukaryotic counterparts and have no known eukaryotic homologues. They are involved in several biological functions (e.g. capsule biosynthesis, antibiotic resistance, virulence mechanism). Thus, they can be considered interesting therapeutic targets to develop new drugs to treat infectious diseases. However, their identification is rendered difficult due to slow progress in their structural characterization and comes most often from biochemical experiments. Moreover BY-kinase sequences are related to many other bacterial proteins involved in several biological functions (e.g. ParA family proteins). Accordingly, their annotations in generalist databases, sequence analysis and classification remain partial and inhomogeneous and there is no bioinformatics resource dedicated to these proteins.
Results: The combination of similarity search with sequence-profile alignment, pattern matching and sliding window computation to detect the tyrosine cluster was used to identify BY-kinase sequences in UniProt Knowledgebase. Cross-validations with keywords searches, pattern matching with several patterns and checking of motifs conservation in multiple sequence alignments were performed. Our pipeline identified 640 sequences as BY-kinases and allowed the definition of a PROSITE pattern that is the signature of the BY-kinases. The sequences identified by our pipeline as BY-kinases share a good sequence similarity with BY-kinases that have already been biochemically characterized, and they all bear the characteristic motifs of the catalytic domain, including the three Walker-like motifs followed by a tyrosine cluster.