Flexible Modeling of Genetic Effects on Function-Valued Traits

J Comput Biol. 2017 Jun;24(6):524-535. doi: 10.1089/cmb.2016.0174. Epub 2017 Jan 5.

Abstract

Genome-wide association studies commonly examine one trait at a time. Occasionally they examine several related traits with the hope of increasing power; in such a setting, the traits are not generally smoothly varying in any way such as time or space. However, for function-valued traits, the trait is often smoothly varying along the axis of interest, such as space or time. For instance, in the case of longitudinal traits such as growth curves, the axis of interest is time; for spatially varying traits such as chromatin accessibility, it would be position along the genome. Although there have been efforts to perform genome-wide association studies with such function-valued traits, the statistical approaches developed for this purpose often have limitations such as requiring the trait to behave linearly in time or space, or constraining the genetic effect itself to be constant or linear in time. Herein, we present a flexible model for this problem-the Partitioned Gaussian Process-which removes many such limitations and is especially effective as the number of time points increases. The theoretical basis of this model provides machinery for handling missing and unaligned function values such as would occur when not all individuals are measured at the same time points. Furthermore, we make use of algebraic refactorizations to substantially reduce the time complexity of our model beyond the naive implementation. Finally, we apply our approach and several others to synthetic data before closing, with some directions for improved modeling and statistical testing.

Keywords: Gaussian process regression; function-valued traits; functional traits; genome-wide association study; linear mixed models; longitudinal traits; radial basis function; time series traits.

MeSH terms

  • Computer Simulation
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Normal Distribution
  • Quantitative Trait, Heritable*
  • Sequence Analysis, DNA / methods*
  • Statistics, Nonparametric