Group analysis of distance matrices

Jinjuan Wang; Jialu Li; Wenjun Xiong; Qizhai Li

doi:10.1002/gepi.22329

Group analysis of distance matrices

Genet Epidemiol. 2020 Sep;44(6):620-628. doi: 10.1002/gepi.22329. Epub 2020 Jun 21.

Authors

Jinjuan Wang^{1

2}, Jialu Li³, Wenjun Xiong⁴, Qizhai Li^{1

2}

Affiliations

¹ LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
² University of Chinese Academy of Sciences, Beijing, China.
³ School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China.
⁴ Guangxi Normal University, Guangxi, China.

PMID: 32567118
DOI: 10.1002/gepi.22329

Abstract

Distance-based regression model has become a powerful approach to identifying phenotypic associations in many fields. It is found to be particularly useful for high-dimensional biological and genetic data with proper distance or similarity measures being available. The pseudo F statistic used in this model accumulates information and is effective when the signals, that is the variations represented by the eigenvalues of the similarity matrix, scatter evenly along the eigenvectors of the similarity matrix. However, it might lose power for the uneven signals. To deal with this issue, we propose a group analysis on the variations of signals along the eigenvalues of the similarity matrix and take the maximum among them. The new procedure can automatically choose an optimal grouping point on some given thresholds and thus can improve the power evidence. Extensive computer simulations and applications to a prostate cancer data and an aging human brain data illustrate the effectiveness of the proposed method.

Keywords: distance-based regression; eigenvalue decomposition; pseudo F test statistic.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Aged, 80 and over
Algorithms
Brain / physiology
Computer Simulation
Female
Gene Expression Regulation, Neoplastic
Humans
Male
Middle Aged
Models, Genetic*
Models, Statistical
Prostatic Neoplasms / genetics
Regression Analysis
Time Factors