Wisdom of artificial crowds feature selection in untargeted metabolomics: An application to the development of a blood-based diagnostic test for thrombotic myocardial infarction

J Biomed Inform. 2018 May:81:53-60. doi: 10.1016/j.jbi.2018.03.007. Epub 2018 Mar 22.


Introduction: Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantification of plasma metabolites by mass spectrometry for developing a test that can differentiate between thrombotic MI, non-thrombotic MI, and stable disease. A significant challenge in developing such a diagnostic test is solving the NP-hard problem of feature selection for constructing an optimal statistical classifier.

Objective: We employed a Wisdom of Artificial Crowds (WoAC) strategy for solving the feature selection problem and evaluated the accuracy and parsimony of downstream classifiers in comparison with traditional feature selection techniques including the Lasso and selection using Random Forest variable importance criteria.

Materials and methods: Artificial Crowd Wisdom was generated via aggregation of the best solutions from independent and diverse genetic algorithm populations that were initialized with bootstrapping and a random subspaces constraint.

Results/conclusions: Strong evidence was observed that a statistical classifier utilizing WoAC feature selection can discriminate between human subjects presenting with thrombotic MI, non-thrombotic MI, and stable Coronary Artery Disease given abundances of selected plasma metabolites. Utilizing the abundances of twenty selected metabolites, a leave-one-out cross-validation estimated misclassification rate of 2.6% was observed. However, the WoAC feature selection strategy did not perform better than the Lasso over the current study.

Keywords: Classification; Diagnostic test; Evolutionary computation; Feature selection; Metabolomics; Myocardial infarction; Wisdom of artificial crowds.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cohort Studies
  • Computer Graphics
  • Coronary Artery Disease / diagnosis*
  • Diagnostic Tests, Routine
  • Hematologic Tests / methods*
  • Humans
  • Kentucky
  • Metabolomics / methods*
  • Models, Statistical
  • Myocardial Infarction / blood*
  • Myocardial Infarction / diagnosis*
  • Pattern Recognition, Automated / methods
  • Reproducibility of Results
  • Software
  • Thrombosis / blood*
  • Thrombosis / diagnosis*