mycoCLAP, the database for characterized lignocellulose-active proteins of fungal origin: resource and text mining curation support

Database (Oxford). 2015 Mar 8;2015:bav008. doi: 10.1093/database/bav008. Print 2015.


Enzymes active on components of lignocellulosic biomass are used for industrial applications ranging from food processing to biofuels production. These include a diverse array of glycoside hydrolases, carbohydrate esterases, polysaccharide lyases and oxidoreductases. Fungi are prolific producers of these enzymes, spurring fungal genome sequencing efforts to identify and catalogue the genes that encode them. To facilitate the functional annotation of these genes, biochemical data on over 800 fungal lignocellulose-degrading enzymes have been collected from the literature and organized into the searchable database, mycoCLAP ( First implemented in 2011, and updated as described here, mycoCLAP is capable of ranking search results according to closest biochemically characterized homologues: this improves the quality of the annotation, and significantly decreases the time required to annotate novel sequences. The database is freely available to the scientific community, as are the open source applications based on natural language processing developed to support the manual curation of mycoCLAP. Database URL:

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Curation
  • Data Mining*
  • Databases, Genetic*
  • Enzymes* / genetics
  • Enzymes* / metabolism
  • Fungal Proteins* / genetics
  • Fungal Proteins* / metabolism
  • Genes, Fungal*
  • Lignin / metabolism*
  • Natural Language Processing*


  • Enzymes
  • Fungal Proteins
  • lignocellulose
  • Lignin