Two Blast-independent tools, CyPerl and CyExcel, for harvesting hundreds of novel cyclotides and analogues from plant genomes and protein databases

Planta. 2015 Apr;241(4):929-40. doi: 10.1007/s00425-014-2229-5. Epub 2014 Dec 21.

Abstract

Two high-throughput tools harvest hundreds of novel cyclotides and analogues in plants. Cyclotides are gene-encoded backbone-cyclized polypeptides displaying a diverse range of bioactivities associated with plant defense. However, genome-scale or database-scale evaluations of cyclotides have been rare so far. Here, a novel time-efficient Perl program, CyPerl, was developed for searching cyclotides from predicted ORFs of 34 available plant genomes and existing plant protein sequences from Genbank databases. CyPerl-isolated sequences were further analyzed by removing repeats, evaluating their cysteine-distributed regions (CDRs) and comparing with CyBase-collected cyclotides in a user-friendly Excel (Microsoft Office) template, CyExcel. After genome-screening, 186 ORFs containing 145 unique cyclotide analogues were identified by CyPerl and CyExcel from 30 plant genomes tested from 10 plant families. Phaseolus vulgaris and Zea mays were the richest two species containing cyclotide analogues in the plants tested. After screening protein databases, 266 unique cyclotides and analogues were identified from seven plant families. By merging with 288 unique CyBase-listed cyclotides, 510 unique cyclotides and analogues were obtained from 13 plant families. In total, seven novel plant families containing cyclotide analogues and 202 novel cyclotide analogues were identified in this study. This study has established two Blast-independent tools for screening cyclotides from plant genomes and protein databases, and has also significantly widened the plant distribution and sequence diversity of cyclotides and their analogues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cyclotides / genetics*
  • Databases, Nucleic Acid*
  • Databases, Protein*
  • Genome, Plant / genetics*
  • Magnoliopsida / genetics*
  • Magnoliopsida / metabolism
  • Models, Molecular
  • Molecular Sequence Data
  • Plant Proteins / genetics
  • Sequence Alignment
  • Sequence Analysis

Substances

  • Cyclotides
  • Plant Proteins