Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 9;89(3):382-97.
doi: 10.1016/j.ajhg.2011.07.023.

Chromosomal Haplotypes by Genetic Phasing of Human Families

Affiliations
Free PMC article

Chromosomal Haplotypes by Genetic Phasing of Human Families

Jared C Roach et al. Am J Hum Genet. .
Free PMC article

Abstract

Assignment of alleles to haplotypes for nearly all the variants on all chromosomes can be performed by genetic analysis of a nuclear family with three or more children. Whole-genome sequence data enable deterministic phasing of nearly all sequenced alleles by permitting assignment of recombinations to precise chromosomal positions and specific meioses. We demonstrate this process of genetic phasing on two families each with four children. We generate haplotypes for all of the children and their parents; these haplotypes span all genotyped positions, including rare variants. Misassignments of phase between variants (switch errors) are nearly absent. Our algorithm can also produce multimegabase haplotypes for nuclear families with just two children and can handle families with missing individuals. We implement our algorithm in a suite of software scripts (Haploscribe). Haplotypes and family genome sequences will become increasingly important for personalized medicine and for fundamental biology.

Figures

Figure 1
Figure 1
Sextet Pedigrees and Representation of Inheritance States (A) Pedigree A and (B) pedigree B (CEPH 1463). Genomes for only the individuals in generations II and III were used for genetic phasing (nuclear-family sextets). The displayed grandparents in generation I of pedigrees A and B have been sequenced, but the data were not used for haplotyping. Grandparental data were used to confirm the phasing of haplotyping for the nuclear families composed by generations II and III. (C) Inheritance states are represented by binary vectors indicating the result of Mendel's first law of segregation at a given aligned position of all the genomes in a pedigree. For example, at this hypothetical tetra-allelic position, the first child has received the first allele from the first parent and the first allele from the second parent; these are indicated as “00.” The other children receive the other alleles, indicated as “11.” Combined, the binary inheritance-state vector for this pedigree at this position is “00111111.” Because the labeling of the parental genotypes is arbitrary, the first two bits in a two-generation nuclear-family inheritance-state vector can always be set to 0. Most variant positions in the genome are biallelic, and so inheritance state must be deduced from sets of adjacent variants.
Figure 2
Figure 2
Constructing a Higher-Dimensional Inheritance State from Tiled Quartet States (A) Initially, the inheritance states of each quartet pair are independently labeled. Considering the pedigree shown, at some particular position of the reference genome individuals B, C, and D have all received identical alleles from the two parents, and so are genetically identical. Individual A received distinct alleles from both parents and so is nonidentical with respect to each of the other three. The binary representations of each quartet state are inconsistent when placed in register with respect to each other. (B) After enumerating all arbitrary reassignments of the first two indicators, the best consistent matching of all six indicators produces a consensus binary representation of the sextet inheritance state. At this position, the first two indicators of each of the B and C, B and D, and C and D quartets are flipped, requiring that the second two indicators in each quartet also be flipped in order to maintain the consistency of the inheritance state.
Figure 3
Figure 3
Phasing Inheritance-State Blocks by Parsimony An inheritance-state vector for four children of a sextet consists of 8 bits. The first, third, fifth, and seventh bits relate the paternal alleles of each of the four children, and the second, fourth, sixth, and last bit relate the maternal alleles. If two bits are identical (i.e., 0 and 0 or 1 and 1), the alleles are IBD. If the bits are not identical (e.g., 1 and 0), the alleles are not IBD. If one of the bits is the ambiguity character (•) then IBD is not determined between that pair of individuals. By convention, the first two bits of an inheritance-state vector are always set to 0. Inheritance-state vectors can be converted to meiosis-indicator vectors by relabeling the bits for each block so that they consistently correspond to the meiotic origin of each allele, rather than simply relating IBD status between individuals. There are four possible meiosis-indicator vectors for each inheritance-state vector. Adjacent blocks of the genome are separated by short distances between informative variants that localize recombinations and so the parsimonious choice of the four labelings is the one that minimizes the number of recombinations between adjacent states. If there has been a single recombination, there is exactly one choice of labeling that represents a single recombination from the previous block (blue arrows). If there are two or more recombinations, then there could be more than one parsimonious choice and ambiguity results (purple arrows). The set of meiosis-indicator vectors in red corresponds to the parsimonious labelings that reflect one recombination each between blocks 1 and 3, 3 and 5, and 5 and 7. Blocks 2, 4, and 6 are intervals in which recombinations have occurred and so contain an ambiguity character.
Figure 4
Figure 4
Example of Haplotype Inference Upper-case alleles are phased genotypes; lower-case alleles are unphased. Haplotyping can be performed as a series of steps. The first step, default or trivial phasing, assigns phase to all homozygous positions. The second steps phases alleles in children or siblings that are identical by descent to alleles phased in the first step. For nuclear families with more than one child, a third step phases parental alleles. (A) Trios permit phasing in the child, but not at positions heterozygous in all three individuals. (B) Quartets permit phasing in the children, as well as within inheritance-state blocks in the parents, but not at positions heterozygous in all four individuals. Phasing in blocks of the parental chromosomes is possible because it is known that no meiotic recombinations occur within a block. Haploscribe performs all of these phasing steps simultaneously by matching all possible phased genotypes to meiosis-indicator vectors. Phasing between inheritance-state blocks requires data from additional children, as described in the text.
Figure 5
Figure 5
The High Density of Variants Determined by Whole-Genome Sequence Data Permits Full-Genome Haplotyping (A) Haplotypes of all the autosomes for the four children of pedigree A. Blue and orange shades represent the two paternal and maternal chromosomes, respectively; dark and light shades represent segments inherited from the corresponding grandfather or grandmother, respectively. (B) Expanded view of chromosome 1 showing the density of variants supporting the meiotic origins of each haplotype. Red, blue, magenta, and green represent regions inherited from the paternal grandfather, paternal grandmother, maternal grandfather, and maternal grandmother, respectively. The height of the gray bracket to the right of each graph corresponds to 1000 variants/Mb.

Similar articles

See all similar articles

Cited by 31 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback