Group analysis of distance matrices

Genet Epidemiol. 2020 Sep;44(6):620-628. doi: 10.1002/gepi.22329. Epub 2020 Jun 21.

Abstract

Distance-based regression model has become a powerful approach to identifying phenotypic associations in many fields. It is found to be particularly useful for high-dimensional biological and genetic data with proper distance or similarity measures being available. The pseudo F statistic used in this model accumulates information and is effective when the signals, that is the variations represented by the eigenvalues of the similarity matrix, scatter evenly along the eigenvectors of the similarity matrix. However, it might lose power for the uneven signals. To deal with this issue, we propose a group analysis on the variations of signals along the eigenvalues of the similarity matrix and take the maximum among them. The new procedure can automatically choose an optimal grouping point on some given thresholds and thus can improve the power evidence. Extensive computer simulations and applications to a prostate cancer data and an aging human brain data illustrate the effectiveness of the proposed method.

Keywords: distance-based regression; eigenvalue decomposition; pseudo F test statistic.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms
  • Brain / physiology
  • Computer Simulation
  • Female
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Male
  • Middle Aged
  • Models, Genetic*
  • Models, Statistical
  • Prostatic Neoplasms / genetics
  • Regression Analysis
  • Time Factors