Localizing and Classifying Adaptive Targets with Trend Filtered Regression

Mehreen R Mughal; Michael DeGiorgio

doi:10.1093/molbev/msy205

Localizing and Classifying Adaptive Targets with Trend Filtered Regression

Mol Biol Evol. 2019 Feb 1;36(2):252-270. doi: 10.1093/molbev/msy205.

Authors

Mehreen R Mughal¹, Michael DeGiorgio^{2

3}

Affiliations

¹ Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA.
² Departments of Biology and Statistics, Pennsylvania State University,University Park, PA.
³ Institute for CyberScience, Pennsylvania State University, University Park, PA.

Abstract

Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Genetic Techniques*
Genetics, Population / methods*
Genome, Human*
Humans
Machine Learning*
Models, Genetic*
Regression Analysis
Software

Abstract

Publication types

MeSH terms

Grants and funding