Detecting Population-Differentiation Copy Number Variants in Human Population Tree by Sparse Group Selection

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):538-549. doi: 10.1109/TCBB.2017.2779481. Epub 2017 Dec 6.

Abstract

Copy-number variants (CNVs) account for a substantial proportion of human genetic variations. Understanding the CNV diversities across populations is a computational challenge because CNV patterns are often present in several related populations and only occur in a subgroup of individuals within each of the population. This paper introduces a tree-guided sparse group selection algorithm (treeSGS) to detect population-differentiation CNV markers of subgroups across populations organized by a phylogenetic tree of human populations. The treeSGS algorithm detects CNV markers of populations associated with nodes from all levels of the tree such that the evolutionary relations among the populations are incorporated for more accurate detection of population-differentiation CNVs. We applied treeSGS algorithm to study the 1,179 samples from the 11 populations in Hapmap3 CNV data. The treeSGS algorithm accurately identifies CNV markers of each population and the collection of populations organized under the branches of the human population tree, validated by consistency among family trios and SNP characterizations of the CNV regions. Further comparison between the detected CNV markers and other population-differentiation CNVs reported in 1,000 genome data and other recent studies also shows that treeSGS can significantly improve the current annotations of population-differentiation CNV markers. TreeSGS package is available at https://github.com/kuanglab/treeSGS.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA Copy Number Variations / genetics*
  • Genetics, Population / methods*
  • Genome, Human / genetics
  • Humans
  • Machine Learning