Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 25;2:150011.
doi: 10.1038/sdata.2015.11. eCollection 2015.

Sequence Variants From Whole Genome Sequencing a Large Group of Icelanders

Free PMC article

Sequence Variants From Whole Genome Sequencing a Large Group of Icelanders

Daniel F Gudbjartsson et al. Sci Data. .
Free PMC article


We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports.

Conflict of interest statement

The authors declare no competing financial interests.


Figure 1
Figure 1. Sequencing depth by individual.
A histogram of the individual mean sequencing depth of the 2,636 whole-genome sequenced Icelanders. Figure reproduced from Supplementary Fig. 1 of ref. .
Figure 2
Figure 2. Overview of sequence alignment and variant calling.
Figure reproduced from Supplementary Fig. 2 of ref. .
Figure 3
Figure 3. Validation of sequencing data.
Distribution of indel length inside (a) and outside (b) protein coding regions. The 4,001 indels inside protein coding regions. Insertions have a positive length and deletions have a negative length. Indels that are not multiples of three are colored grey. Indels that are a multiple of three are colored black. The fraction of SNPs and indels identified in 2,636 Icelanders present in dbSNP (build 137) or the Exome Sequencing Project (ESP) by consequence (c). The analysis was restricted to 16,587,813 SNPs and 1,191,089 indels for which the ancestral allele could be inferred. Shown is the overlap with the union of dbSNP and ESP as a function of derived allele frequency (DAF) by annotation and variant type. Comparison of imputed and chip genotypes (d). Shown is the fraction of the 28,204 SNPs identified in exons and splice regions and present on SNP chips that have r2>0.8, 0.9 and 0.99 between imputed and chip genotypes as a function of their derived allele frequency (DAF). Figure reproduced from Figs 1, 5 and Supplementary Fig. 5 of ref. .

Dataset use reported in

  • Sci Data. doi: 10.1038/ng.3247

Similar articles

See all similar articles

Cited by 19 articles

See all "Cited by" articles


Data Citations

    1. Gudbjartsson D. F., Sulem P., Stefansson K.. European Variation Archive. 2015 PRJEB8636


    1. Frazer K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). - PMC - PubMed
    1. Marchini J., Howie B., Myers S., McVean G. & Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007). - PubMed
    1. Hindorff L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). - PMC - PubMed
    1. Abecasis G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). - PMC - PubMed
    1. Tennessen J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). - PMC - PubMed

Publication types