Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Scheme of the recombination detection process for one run of the algorithm.
(A) Input dataset of 10 sequences and 83 SNPs. Colors on sequences represent similar patterns of SNPs, and a change of color along a sequence represents the signal of past recombination events. (B) Recoded matrix. The patterns of SNPs within a column of grain size n (10 SNPs in this example) have been recoded into numbers. Those sequences having the same pattern within a column will be assigned the same number. Between columns, numbers represent completely different patterns. Unique patterns are assigned the number zero and will not be considered. (C) Trees one, two and three, constructed based on the recoded matrix. Going from left to right, the recoded matrix is segmented into sets of compatible columns of patterns. Compatibility of columns is checked using a variant of the four gamete test for multi-allelic markers. Each segment is represented as a tree in which the leaf nodes contain the sequences analyzed and the edges contain the patterns inherited, similar to point mutations. Recurrence is not allowed. (D) Networks 1–2 and 2–3 constructed from consecutive trees one, two and three merged pairwise. All the information contained in the two original trees will be present in the compatible network. Recombinant sequences are leaf nodes descending from nodes having two parents, which means that have inherited patterns from two different nodes (similar to an Ancestral Recombination Graph). (E) Information saved for each detected recombination event: the recombinants sequences and the starting and ending position of the network. For a more detailed description of the algorithm see . In red, the recombination event that will be further studied in Figure 2.
Figure 2
Figure 2. Scheme of the recombination detection process integrating 10 runs of the algorithm.
The analyzed dataset is the one shown in Figure 1. (A) Integration of the information of 10 runs regarding the recombination event of sequence 5. For each run of the algorithm, the starting and ending position of the network in which the recombination is detected, is saved. For each run, the size of the first column varies, being 10, 1, 2, 3… up to 9 and therefore the number of runs corresponds to the grain size. At the end, for each recombination event, we have a set of intervals in which it was detected which can be represented graphically as a distribution. The maximum interval represents the region in which the recombination has been seen the maximum number of times. The mean point of the maximum interval is defined as the estimated breakpoint position. The threshold indicates the number of times a recombination has to be detected to be considered as true. The intersection between the threshold and the detection distribution defines the threshold interval in which the algorithm guarantees that the recombination event is located. (B) Integration of the information of all detections for the 10 runs of the algorithm. Each line represents a set of sequences in which the same recombination event has been detected; the distribution of the line shows the number of times the event has been detected along the sequence. (C) Final output of the algorithm: breakpoint positon in the first row, the recotypes in rows and the recombination events detected in columns. The presence of a particular recombination event in a particular sequence is represented as a 1, and absence as a 0. Note that the recotypes represent exactly the coloring of the sequences in Figure 1 and that only recombinations that had a distribution above the threshold are represented in the recotypes.
Figure 3
Figure 3. Distribution of the number of detections using the optimal method.
Each line represents the distribution of detections for particular recombination events. The dataset corresponds to one COSI simulation. Only those recombinations reaching the threshold will be considered as true events. The pick of each distribution will locate the breakpoint position for each particular recombination event along the sequence. The optimal method (grains 20, 10 and 5 forward and reverse and a threshold of 42) creates narrower maximal intervals in the detection distributions than when only using grain 10.
Figure 4
Figure 4. Values of the aggregate Z scores for different settings.
Z scores were calculated over mean values for 100 simulations of false discovery rate, sensitivity and 90th percentile of the distance between the inferred breakpoint to the true position. Different colored lines represent different methods, the numbers on the legend inform on the grain size used and whether they combine more than one grain size. All methods are run using a sliding window and forward and reverse. Different thresholds are represented along the X axes. Threshold is defined as number of detections to be considered as true divided by the number of runs of the algorithm.
Figure 5
Figure 5. Sensitivity of the optimal method to detect recombinations depending on age.
Results plotted are the averaged between 100 simulations. The black curve depicts how sensitivity of IRiS varies with the age of the recombination events (in bins of 500 generations) and follows the left axis. The two gray curves represent the number of recombination events generated by COSI and detected by IRiS and follow the right axis.
Figure 6
Figure 6. Recombination rates inferred from sperm typing, LD-based methods and IRiS on the MS32 region.
(A) Inferred recombination rates based on sperm typing information; figure adapted from the figures in in which they calculate recombination rates through sperm typing. (B) Recombination rate inferred by LDhat. (C) Number of recombination events detected by IRiS using the optimal method. Recombination rates inferred in (A) are based on a single individual whereas recombination rates inferred at (B) and (C) are based on the same population data. Position zero marks the location of the minisatellite MS32.
Figure 7
Figure 7. Sensitivity of the optimal method evaluated in silico.
The plot shows the number of times in silico recombination events along the sequence were detected by IRiS depending on the breakpoint location. Different colors indicate different ways to produce the recombinant sequence, from light gray to black: “random” indicates that parental haplotypes were taken at random, “1dif near bkp” indicates that parental sequences had to be different near the breakpoint region (plus minus 10 SNPs), “ 2 dif near bkp” indicates that parental sequences had to be different near the breakpoint regions at both sides of the breakpoint, and “ unique” indicates that the parental sequences had to be different near the breakpoint region and the recombinant sequence had to be unique within the breakpoint region. Below, the recombination rate estimated by LDhat is shown, following the right axis.
Figure 8
Figure 8. Nucleotide and recombination diversity.
Values were calculated for each of the populations based both on haplotypes and recotypes for the 18 regions. Values of recombination diversity have been multiplied by 100 to make them comparable.
Figure 9
Figure 9. First and second components of the Principal Components Analysis.
Only recombinations present in at least in two individuals were taken for the analysis. The first component explained 18.03% of the variance and the second component 14.53%.

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

References

    1. Parvanov ED, Petkov PM, Paigen K. Prdm9 Controls Activation of Mammalian Recombination Hotspots. Science. 2009;327:835. - PMC - PubMed
    1. Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, et al. Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination. Science. 2009;327:876–879. - PMC - PubMed
    1. Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, et al. PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice. Science. 2009;327:836–840. - PMC - PubMed
    1. Paigen K, Petkov P. Mammalian recombination hot spots: properties, control and evolution. Nat Rev Genet. 2010;11:221–233. - PMC - PubMed
    1. Fisher RA. A fuller theory of “Junctions” in inbreeding. Heredity. 1954;8:187–197.

Publication types

Feedback