Disentangling genetic feature selection and aggregation in transcriptome-wide association studies

Chen Cao; Pathum Kossinna; Devin Kwok; Qing Li; Jingni He; Liya Su; Xingyi Guo; Qingrun Zhang; Quan Long

doi:10.1093/genetics/iyab216

Disentangling genetic feature selection and aggregation in transcriptome-wide association studies

Genetics. 2022 Feb 4;220(2):iyab216. doi: 10.1093/genetics/iyab216.

Authors

Chen Cao¹, Pathum Kossinna¹, Devin Kwok², Qing Li¹, Jingni He¹, Liya Su³, Xingyi Guo⁴, Qingrun Zhang^{1

2}, Quan Long^{1

2

5

6}

Affiliations

¹ Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
² Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada.
³ Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA.
⁴ Division of Epidemiology, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
⁵ Department of Medical Genetics, University of Calgary, Calgary, AB T2N 4N1, Canada.
⁶ Hotchkiss Brain Institute, O'Brien Institute for Public Health, University of Calgary, Calgary, AB T2N 4N1, Canada.

Abstract

The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.

Keywords: feature selection; kernel machine; statistical genetics; statistical power; transcriptome-wide association studies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genetic Predisposition to Disease
Genome-Wide Association Study* / methods
Humans
Phenotype
Polymorphism, Single Nucleotide
Quantitative Trait Loci
Transcriptome*