Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons

J Pharm Biomed Anal. 2018 Aug 5:157:20-26. doi: 10.1016/j.jpba.2018.04.052. Epub 2018 May 1.

Abstract

Feature relationships are complex and may contain important information. k top scoring pairs (k-TSP) studies feature relationships by the horizontal comparison. This study examines feature relationships and proposes vertical and horizontal k-TSP (VH-k-TSP) to identify the discriminative feature pairs by evaluating feature pairs based on the vertical and horizontal comparisons. Complexity is introduced to compute the discriminative abilities of feature pairs by means of these two comparisons. VH-k-TSP was compared with support vector machine-recursive feature elimination, relative simplicity-support vector machine, k-TSP and M-k-TSP on nine public genomics datasets. For multi-class problems, one-to-one method was used. The experiments showed that VH-k-TSP outperformed the four methods in most cases. Then, VH-k-TSP was applied to a metabolomics data of liver disease. An accuracy rate of 88.11 ± 3.30% in discrimination between cirrhosis and hepatocellular carcinoma was obtained by VH-k-TSP, better than 77.39 ± 4.10% and 79.28 ± 3.73% obtained by k-TSP and M-k-TSP, respectively. Hence combining the vertical and horizontal comparisons could define more discriminative feature pairs.

Keywords: Classification; Feature relationship; Feature selection; Hepatocellular carcinoma.

MeSH terms

  • Carcinoma, Hepatocellular / genetics*
  • Carcinoma, Hepatocellular / metabolism*
  • Genomics / methods
  • Humans
  • Liver Diseases / genetics*
  • Liver Diseases / metabolism*
  • Liver Neoplasms / genetics*
  • Liver Neoplasms / metabolism*
  • Metabolomics / methods
  • Support Vector Machine