Precision assessment of some supervised and unsupervised algorithms for genotype discrimination in the genus Pisum using SSR molecular data

J Theor Biol. 2015 Mar 7;368:122-32. doi: 10.1016/j.jtbi.2015.01.001. Epub 2015 Jan 12.

Abstract

For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop.

Keywords: DNA fingerprinting; Data mining; Genus Pisum; Machine learning; SSR markers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Cluster Analysis
  • DNA Fingerprinting / methods
  • DNA, Plant / genetics
  • Decision Trees
  • Genes, Plant*
  • Genetic Markers
  • Genotype
  • Microsatellite Repeats
  • Models, Genetic*
  • Peas / genetics*
  • Species Specificity
  • Support Vector Machine

Substances

  • DNA, Plant
  • Genetic Markers