In 1998, Wilkins et al. (J. Mol. Biol. 1998, 278, 599-608) reported high specificity in terminal regions (terminal tags) of 15 519 proteins from five organisms and proposed a methodology for identifying proteins by terminal tags. However, their examined sequence data were not based on complete genome sequences. Here, we examined current proteome data (217 249 entries from UniProt 2013_6 complete/reference proteome for nine organisms including human) in terms of the specificity of terminal tags and their computational annotation. One example from the results indicated that the specificity of N-terminal tags plateaued at 28% at a length of six residues for human; even when using both N- and C-terminal tags, specificity was merely 66%. In order to determine the cause of these low specificities, the annotation of proteins sharing terminal tags with other proteins was examined. The results suggested that a large majority were phylogenetically or functionally related, whereas nonrelated proteins sharing terminal tags made up less than 1% of human proteome data. On the basis of these findings, we constructed the terminal tag sequence database ProteinCarta (http://ms3d.jp/software/proteincarta/), which includes all terminal tags of proteomes from the nine organisms analyzed here, in order to confirm the specificity of terminal tags and to identify the parent protein.
Keywords: Protein terminal specificity; bioinformatics; database; protein identification; terminal sequence tag.