Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;17(5):1009-1024.
doi: 10.1111/1755-0998.12665. Epub 2017 Apr 6.

Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond

Affiliations

Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond

Jisca Huisman. Mol Ecol Resour. 2017 Sep.

Abstract

Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half-siblings sharing an unsampled parent and assigns grandparents to half-sibships. Assignments are made after consideration of the likelihoods of all possible first-, second- and third-degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill-climbing algorithm. Distinction between the various categories of second-degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000-2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent-offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.

Keywords: sequoia; parentage assignment; pedigree; sibship clustering; single nucleotide polymorphism.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example part pedigree with only paternal links shown. Abbreviations indicate when the link is inferred: during (1) parentage assignment, (2) sibship clustering (assignment of a dummy parent), (3a) assignment of genotyped grandparents to sibships, (3b) assignment of dummy individuals as grandparents to other sibships, or (dashed) based on nongenetic data only (not by sequoia). Note that links 3a and 3b are not inferred by other programs, which would result in four unconnected pedigree fragments.
Figure 2
Figure 2
Overview of program use. Input consists of a numeric matrix with genotypes either converted from standard plink format or simulated from a pedigree, and a dataframe with life‐history data (ID, sex and birth year), and output of an r list with the pedigree and various other elements. A detailed manual is given in the r vignette.
Figure 3
Figure 3
Single‐locus probability of observing genotypes A and B (0, 1 or 2 copies of the minor allele) as a function of the minor allele frequency q, under the hypotheses U (solid grey line), PO (dashed black), FS (dotted black), HS, GG or FA (solid black, indistinguishable from each other), or HA or GGG (dashed grey) (Equations S2–S11 in Appendix S1, Supporting information).
Figure 4
Figure 4
Mating scheme in Pedigree II, showing a subset of individuals selected to breed in G1, their parents (in G0) and their offspring (in G2), some of which are selected at random (larger symbols) to become parents of G3. Note that by chance, two full‐siblings are selected as mates (2nd and 3rd individual from the left in G1).
Figure 5
Figure 5
Pairs truly related according to a focal relationship (headers, solid outline) are more clearly distinguished from other related pairs (dashed outline) using ΛR/∨ (bottom row) than when using ΛR/U (top). Likelihoods are not conditional on any parental genotypes for PO (left) and FS (middle), and conditional on the genotypes of one parent each for HS (right) (not shown: ΛHS/∨ for true FS is around −170). Vertical lines indicate the values of T filter = −2 (top) and T assign = 0.5 (bottom) used throughout the Results. Based on 10 000 simulations of a simple pedigree with unrelated founders and 400 SNPs with MAF 0.3–0.5 and ε = 0.005. [Colour figure can be viewed at http://wileyonlinelibrary.com]
Figure 6
Figure 6
Parent assignment using franz, sequoia (without sibship clustering) or opposite‐homozygosity‐based exclusion (OH‐Excl)in simulated data sets based on three different pedigree structures, with all parental genotypes assumed known. Each point denotes the average over 20 simulations, values are given in Table S4 (Supporting information). Note log scale and broken y‐axes for 1‐AR and ER.
Figure 7
Figure 7
As Fig. 6, for clustering of FS families with no genotyped parents, assuming a polygamous or monogamous breeding system. Averages over 10 replicates (sequoia) or three replicates (colony) were used; colony was not run for 800 SNPs.
Figure 8
Figure 8
AR of parentage assignment (open circles) is necessarily strongly correlated with the proportion of genotyped parents, but this dependence is much weaker for full pedigree reconstruction (filled circles). Results shown for L = 400 SNPs; see Fig. S10 in Appendix S1 (Supporting information) for ER and runtimes.
Figure 9
Figure 9
Pairwise relatedness in an empirical red deer data set, as estimated from 40 000 polymorphic SNPs using gcta (y‐axes), and (a) a previous microsatellite‐based pedigree, (b) from the pedigree inferred using sequoia on 440 SNPs with high MAF and in low LD, or (c) from these same 440 SNPs using gcta. n denotes the number of pairwise relationships, related to the number of unique individuals i as n = i × (i − 1)/2.
Figure 10
Figure 10
Examples of double relationships between genotyped individuals A and B, where D B and S AB may or may not be genotyped, and D A is not genotyped. Description and likelihood equations in Methods in Appendix S1 (Supporting information).

Similar articles

Cited by

References

    1. Almudevar A (2007) A graphical approach to relatedness inference. Theoretical Population Biology, 71, 213–229. - PMC - PubMed
    1. Anderson EC (2012) Large‐scale parentage inference with SNPs: an efficient algorithm for statistical confidence of parent pair allocations. Statistical Applications in Genetics and Molecular Biology, 11, 12. - PubMed
    1. Anderson EC, Garza JC (2006) The power of single‐nucleotide polymorphisms for large‐scale parentage inference. Genetics, 172, 2567–2582. - PMC - PubMed
    1. Anderson EC, Ng TC (2016) Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor‐graph representation. Theoretical Population Biology, 107, 39–51. - PubMed
    1. Bérénos C, Ellis PA, Pilkington JG, Pemberton JM (2014) Estimating quantitative genetic parameters in wild populations: a comparison of pedigree and genomic approaches. Molecular Ecology, 23, 3434–3451. - PMC - PubMed