Subtree selection in kernels for graph classification

Int J Data Min Bioinform. 2013;8(3):294-310. doi: 10.1504/ijdmb.2013.056080.

Abstract

Classification of structured data is essential for a wide range of problems in bioinformatics and cheminformatics. One such problem is in silico prediction of small molecule properties such as toxicity, mutagenicity and activity. In this paper, we propose a new feature selection method for graph kernels that uses the subtrees of graphs as their feature sets. A masking procedure which boils down to feature selection is proposed for this purpose. Experiments conducted on several data sets as well as a comparison of our method with some frequent subgraph based approaches are presented.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Chemical
  • Drug Discovery / methods
  • Female
  • Male
  • Mice
  • Organic Chemicals / chemistry
  • Organic Chemicals / toxicity
  • Quantitative Structure-Activity Relationship
  • Rats

Substances

  • Organic Chemicals