Inference of kinship using spatial distributions of SNPs for genome-wide association studies

BMC Genomics. 2016 May 20:17:372. doi: 10.1186/s12864-016-2696-0.

Abstract

Background: Genome-wide association studies (GWASs) are powerful in identifying genetic loci which cause complex traits of common diseases. However, it is well known that inappropriately accounting for pedigree or population structure leads to spurious associations. GWASs have often encountered increased type I error rates due to the correlated genotypes of cryptically related individuals or subgroups. Therefore, accurate pedigree information is crucial for successful GWASs.

Results: We propose a distance-based method KIND to estimate kinship coefficients among individuals. Our method utilizes the spatial distribution of SNPs in the genome that represents how far each minor-allele variant is located from its neighboring minor-allele variants. The SNP distribution of each individual was presented in a feature vector in Euclidean space, and then the kinship coefficient was inferred from the two vectors of each individual pair. We demonstrate that the distance information can measure the similarity of genetic variants of individuals accurately and efficiently. We applied our method to a synthetic data set and two real data sets (i.e. the HapMap phase III and the 1000 genomes data). We investigated the estimation accuracy of kinship coefficients not only within homogeneous populations but also for a population with extreme stratification.

Conclusions: Our method KIND usually produces more accurate and more robust kinship coefficient estimates than existing methods especially for populations with extreme stratification. It can serve as an important and very efficient tool for GWASs.

Keywords: 1000 Genomes; GWAS; HapMap; Kinship; Population structure; SNP.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Alleles
  • Genetic Predisposition to Disease*
  • Genetics, Population
  • Genome, Human
  • Genome-Wide Association Study* / methods
  • Genomics / methods
  • HapMap Project
  • Humans
  • Models, Genetic*
  • Pedigree
  • Polymorphism, Single Nucleotide*