Genotype-Frequency Estimation from High-Throughput Sequencing Data

Genetics. 2015 Oct;201(2):473-86. doi: 10.1534/genetics.115.179077. Epub 2015 Jul 29.

Abstract

Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy-Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.

Keywords: Hardy–Weinberg test; genotype frequency; inbreeding coefficient; polymorphism detection; population genomics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Data Mining / methods*
  • Data Mining / statistics & numerical data
  • Genetics, Population
  • Genomics
  • Genotype*
  • High-Throughput Nucleotide Sequencing / methods*
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Likelihood Functions
  • Software*