Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;559(7714):350-355.
doi: 10.1038/s41586-018-0321-x. Epub 2018 Jul 11.

Insights Into Clonal Haematopoiesis From 8,342 Mosaic Chromosomal Alterations

Free PMC article

Insights Into Clonal Haematopoiesis From 8,342 Mosaic Chromosomal Alterations

Po-Ru Loh et al. Nature. .
Free PMC article


The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Extended Data Figure 1
Extended Data Figure 1. Examples of mosaic events called using phased genotyping intensities
(a–c) UK Biobank mCA sample 2791 has a mosaic deletion of chr13 from ∼31–53Mb that cannot be confidently called from unphased B allele frequency (BAF) and log2 R ratio (LRR) data (a,c). However, the existence of an event is evident in the phased BAF data (b), and the regional decrease in LRR indicates that this event is a deletion. In panel (b), mean phased BAF is plotted for SNPs aggregated into bins spanning n=25 heterozygous sites; the same bins are used for panel (c). Error bars, s.e.m. d–f) Sample 1645 has a mosaic CNN-LOH on chr9p from the 9p telomere to ∼26Mb that cannot be confidently called from unphased BAF data (d) but is evident in phased BAF data (e). A phase switch error causes a sign flip in phased BAF at ∼20Mb. The lack of a shift in LRR in the region (f) indicates that this event is a CNN-LOH. In panel (e), mean phased BAF is plotted for SNPs aggregated into bins spanning n=50 heterozygous sites; the same bins are used for panel (f). Error bars, s.e.m. (g–i) Sample 2464 has a full-chromomose mosaic event on chr12 that cannot be confidently called from unphased BAF and LRR data (g,i) but is evident in phased BAF data (h). Several phase switch errors cause sign flips in phased BAF across chr12. The slight positive shift in mean LRR (i) indicates that this event is most likely a mosaic gain of chr12. In panel (h), mean phased BAF is plotted for SNPs aggregated into bins spanning n=50 heterozygous sites; the same bins are used for panel (i). Error bars, s.e.m.
Extended Data Figure 2
Extended Data Figure 2. Estimation of true FDR using age distributions of individuals with mCA calls
We generated age distributions for (i) “high-confidence” detected events passing a permutation-based FDR threshold of 0.01 (bright red), (ii) “medium-confidence” events below the FDR threshold of 0.01 but passing an FDR threshold of 0.05 (darker red), and (iii) “low-confidence” events below the FDR threshold of 0.05 but passing an FDR threshold of 0.10 (darkest red; not analyzed but plotted for context). We compared these distributions to the overall age distribution of UK Biobank participants (grey). Based on the numbers of events in each category, ≈20% of medium-confidence detected events are expected to be false positives. To estimate our true FDR, we regressed the medium-confidence age distribution on the high-confidence and overall age distributions, reasoning that the medium-confidence age distribution should be a mixture of (a) correctly-called events with age distribution similar to that of the high-confidence events and (b) spurious calls with age distribution similar to the overall cohort. We observed a regression weight of 0.31 for the component corresponding to spurious calls, in good agreement with expectation, and implying a true FDR of 7.5% (6.2–8.8%, 95% CI based on regression fit on n=6 age bins).
Extended Data Figure 3
Extended Data Figure 3. Clonal cell fractions of co-occurring events generally suggest co-existence within the same cell population
For each pair of significantly co-occurring events (Fig. 2b), we compared the clonal fractions of the two events within each individual that carried both events. Each point in the plots corresponds to an individual carrying the pair of events under consideration; individuals are color-coded by the total number of events they carry. For nearly all pairs of events, the clonal fractions of the two events were very similar in most individuals carrying both events, suggesting that the events occurred in the same clonal cell population. A few exceptions do seem to exist, e.g., 22q– vs. 13q CNN-LOH cell fraction; here, the cell fractions suggest that 13q CNN-LOH events may be present in a subclone. This observation is consistent with acquired uniparental disomy of 13q providing a second hit within a del(13q14) clonal expansion, as we see in Extended Data Fig. 8. (We did not include del(13q14) vs. 13q CNN-LOH in this plot because inference of clonal fractions is complex for these overlapping events; see Extended Data Fig. 8.)
Extended Data Figure 4
Extended Data Figure 4. Replication of previous association between JAK2 46/1 haplotype and 9p CNN-LOH in cis due to clonal selection
The common JAK2 46/1 haplotype has previously been shown to confer risk of somatic JAK2 V617F mutation such that subsequent 9p CNN-LOH produces a strong proliferative advantage-, (right side of figure). In our analysis, CNN-LOH on 9p is strongly associated with JAK2 46/1 (P=1.6×10−13, OR = 2.7 (2.1–3.5); Fisher's exact test on n=120,664 individuals) with the risk haplotype predominantly duplicated by CNN-LOH in hets (52 of n=61 heterozygous cases; binomial P=1.8×10−8). In the left side of this figure, the genomic modification is illustrated in the top panel and association signals are plotted in the bottom. The lead associated variant is labeled, and variants are colored according to linkage disequilibrium with the lead variant (scaled for readability).
Extended Data Figure 5
Extended Data Figure 5. Evidence of multiple causal variants for 10q25.2 breakage and 1p CNN-LOH associations
(a) Multiple expanded repeats at FRA10B drive breakage at 10q25.2. We identified 12 distinct primary repeat motifs at FRA10B in 26 whole-genome-sequenced individuals from 14 families (labeled VNTR-N-x, where N denotes length in base pairs); carriers of these repeats exhibit varying degrees of FRA10B repeat expansion (Supplementary Note 8). The repeat motifs are AT-rich and are similar to FRA10B repeats previously reported by Hewett et al.. The alignment provided here includes the repeat motifs that ref. most frequently observed in FRA10B expanded alleles (E8, E13, E17, and E19) along with a few other closely related expanded repeat motifs (E10, E11, and E12). (b) Carriers of the 10q terminal deletion in UK Biobank share long haplotypes at 10q25.2 identical-by-descent. Square nodes in the IBD graph correspond to males and circles to females. Node size is proportional to cell fraction and edge weight increases with IBD length. Colored nodes indicate imputed carriers of variable number tandem repeats (VNTRs) at FRA10B (Supplementary Table 7); color intensity scales with imputed dosage. (c) Identity-by-descent graph at MPL locus (chr1:43.8Mb) on individuals with mCAs on chr1 extending to the p-telomere. Colored nodes indicate imputed carriers of SNPs independently associated with mosaic 1p CNN-LOH (Fig. 4a).
Extended Data Figure 6
Extended Data Figure 6. Germline CNVs at 15q26.3
(a) Read depth profile plot of WGS samples in the terminal 700kb of chr15q. Three individuals in one family carry a ∼70kb deletion at 15q26.3, and a fourth carries the same deletion along with a ∼290kb duplication (probably on the same haplotype based on population frequencies of these events; see Extended Data Fig. 7). These four individuals (highlighted in blue) segregate with the rs182643535:T allele in the WGS cohort. Inset: The parental carrier in the family, individual 10921, has detectable mosaicism in two distinct 15q CNN-LOH subclones (one starting at 41.64Mb with 4.6% cell fraction, the other starting at 71.64Mb with an additional 2.0% cell fraction). (b) Zoomed-in read depth profile plot, with deletion-only individuals highlighted in blue and the del+dup individual highlighted in green. Breakpoint analysis indicates that the ∼70kb deletion spans chr15:102151467–102222161 and contains a 1139bp mid-segment (chr15:102164897–102166035) that is retained in inverted orientation. The ∼290kb duplication spans chr15:102026997–102314016.
Extended Data Figure 7
Extended Data Figure 7. Mosaic chromosomal alterations and germline CNVs at 15q26.3
Using identified breakpoints of the germline ∼70kb deletion and ∼290kb duplication (Extended Data Fig. 6), we computed mean genotyping intensity (LRR) in UK Biobank samples within the ∼70kb deletion region (24 probes) and within the flanking ∼220kb region (97 probes). Individuals are plotted by flanking 220kb mean LRR vs. 70kb mean LRR and colored by mosaic status for somatic 15q mCAs. UK Biobank samples carrying the 70kb deletion, 290kb duplication, and del+dup are all easily identifiable in distinct clusters. The plot also appears to contain clusters with higher copy number. Of the three CNV-carrying alleles, the simple 70kb deletion is the only one that predisposes to mCAs. Most mosaic events containing the 70kb deletion are CNN-LOH events that make cells homozygous for the 70kb deletion; two individuals have somatic loss of the homologous (normal) chromosome, making cells hemizygous for the 70kb deletion.
Extended Data Figure 8
Extended Data Figure 8. Phased BAF plots of chromosomes with multiple CNN-LOH subclones
All of the plots exhibit step functions of increasing |ΔBAF| toward a telomere, which is the hallmark of multiple clonal cell populations containing distinct CNN-LOH events that affect different spans of a chromosomal arm (all extending to the telomere). Distinct |ΔBAF| values (called using an HMM) are indicated with different colors. Flips in the sign of phased BAF usually correspond to phase switch errors. Two samples exhibit high switch error rates: 14q individual 3067 (explained by non-European ethnicity), and 1p individual 23 (explained by very high |ΔBAF|; extreme shifts in genotyping intensities result in poor genotyping quality). All five individuals with multiple CNN-LOH events on chr13q appear to contain switch errors over 13q14, but these switches are actually explained by overlapping 13q14 deletions; see Supplementary Note 1 for detailed discussion.
Extended Data Figure 9
Extended Data Figure 9. CLL prediction accuracy: ROC and precision-recall curves
CLL prediction benchmarks using 10-fold stratified cross validation on: (a,b) only individuals with lymphocyte counts in the normal range (1×109/L to 3.5×109/L), as in our primary analyses (n=36 cases, 113,923 controls); and (c,d) individuals with any lymphocyte count (n=78 cases, 118,481 controls). Panel (a) matches Fig. 5b, and panel (b) shows the precision-recall curve from the same analysis. Panels (c) and (d) correspond to an analogous analysis in which we removed the restriction on lymphocyte count and also used additional mosaic event variables for prediction (11q–, 14q–, 22q–, and total number of autosomal events). In both benchmarks, individuals with previous cancer diagnoses or CLL diagnoses within 1 year of assessment are excluded; however, some individuals with very high lymphocyte counts pass this filter (and probably already had CLL at assessment despite being undiagnosed for >1 year), hence the difference in apparent prediction accuracy between the two benchmarks.
Extended Data Figure 10
Extended Data Figure 10. Mosaic chromosomal alterations detected in CLL cases sorted by lymphocyte count
Individuals are stratified by cancer status at DNA collection (no previous diagnosis vs. any previous diagnosis), and mCAs (loss=red, CNN-LOH=green, gain=blue, undetermined=grey) are plotted per chromosome as colored rectangles (with height increasing with BAF deviation).
Figure 1
Figure 1. Mosaic chromosomal alterations detected in 151,202 UK Biobank participants
Each horizontal line corresponds to an mCA; a total of 5,562 autosomal events in 4,889 unique individuals are displayed. We detected an additional 2,780 chromosome X events in females (mostly whole-chromosome losses). Detected events are color-coded by copy number. Focal deletions are labeled in red with names of putative target genes. Loci containing inherited variants influencing somatic events in cis are labeled in the color of the mCA (red for del(10q)-associated FRA10B, green for CNN-LOH-associated loci). Enlarged per-chromosome plots are provided in Supplementary Note 2.
Figure 2
Figure 2. Distributional properties of detected mCAs
(a) Log2 R ratio (LRR), measuring total allelic intensity, scales roughly linearly with B-allele frequency (BAF) deviation, measuring relative allelic intensity, among events with each copy number,,. (b) Most individuals with a detected autosomal mCA have only one event, although a larger number than expected (441 vs. 100) have multiple events. Several pairs of mCA types co-occur much more frequently than expected by chance; edge weights in the co-occurrence graph scale with enrichment. (c) Autosomes with more gain events tend to have fewer loss events (excluding deletions involving V(D) J recombination on chromosomes 14 and 22); Spearman's test on n=22 autosomes. (d) Fractions of individuals with at least one detected autosomal event increase steadily with age, and this trend is even more pronounced for X chromosome events in females. Error bars, 95% CI. (e) Carriers of different mCA types have different age and sex distributions. Error bars, s.e.m. (f) Different mCAs are significantly enriched (FDR 0.05) among individuals with anomalous blood counts in different blood lineages (adjusted for age, sex, and smoking status; Methods). Numeric data including exact sample sizes used to compute error bars are provided in Supplementary Tables 1–6.
Figure 3
Figure 3. Repeat expansions at fragile site FRA10B driving breakage at 10q25.2
(a) Germline variants at 10q25.2 associate strongly with terminal 10q mosaic deletion (Fisher's exact test, n=120,664 individuals). Left boundaries of the deletions are called with error; true breakpoints are probably near-identical (Supplementary Note 4). (b) UK Biobank carriers of terminal 10q deletion are predominantly female (51 of n=60 individuals; error bars, 95% CI) with age distribution similar to the overall study population (violin plot centers, means; error bars, 95% CI). (c) WGS samples with terminal 10q deletion (two parent-child duos) carry inherited expanded repeats at FRA10B.
Figure 4
Figure 4. Novel loci associated with mCAs in cis due to clonal selection
(a) MPL, (b) ATM, (c), TM2D3/TARSL2. In each locus, one or more inherited genetic variants predispose chromosomal mutations to create a proliferative advantage. Bottom: genomic modifications; top: association P-values (Fisher's exact test, n=120,664 individuals). Independent lead associated variants are labeled, and variants are colored according to linkage disequilibrium with lead variants (in shades of red, gold, or green; variants in grey are not in LD with lead variants). In panel (c), the differing arrow weights to CNN-LOH and loss events indicate that CNN-LOH is the more common scenario (both in the population and among carriers of the risk variant).
Figure 5
Figure 5. Associations between mCAs and incident cancers and mortality
(a) Multiple mCA types confer increased risk of incident blood cancers diagnosed >1 year after DNA collection in n=109,819 individuals with normal blood counts at assessment (Cochran-Mantel-Haenszel test adjusting for age and sex; error bars, 95% CI). (b) A logistic model including mosaic status for 13q and trisomy 12 events along with other risk factors achieves high out-of-sample prediction accuracy for incident CLL (n=36 cases and 113,923 controls with no cancer history). (c) Time to malignancy tracks inversely with clonal cell fraction in n=46 individuals with detectable clonality (of any mCA) who received CLL diagnoses after assessment (one-sided Pearson's test). (d) Loss, gain, and CNN-LOH events (on any autosome) all confer increased mortality risk in n=128,854 individuals with no cancer history and n=15,782 with prevalent cancers (error bars, 95% CI). Sample exclusions are detailed in Methods. Numeric data are provided in Supplementary Tables 12 and 13.

Similar articles

  • Mosaic maternal 10qter deletions are associated with FRA10B expansions and may cause false-positive noninvasive prenatal screening results.
    Huijsdens-van Amsterdam K, Straver R, van Maarle MC, Knegt AC, Van Opstal D, Sleutels F, Smeets D, Sistermans EA. Huijsdens-van Amsterdam K, et al. Genet Med. 2018 Nov;20(11):1472-1476. doi: 10.1038/gim.2018.32. Epub 2018 Mar 1. Genet Med. 2018. PMID: 29493577
  • Confirmation of the reported association of clonal chromosomal mosaicism with an increased risk of incident hematologic cancer.
    Schick UM, McDavid A, Crane PK, Weston N, Ehrlich K, Newton KM, Wallace R, Bookman E, Harrison T, Aragaki A, Crosslin DR, Wang SS, Reiner AP, Jackson RD, Peters U, Larson EB, Jarvik GP, Carlson CS. Schick UM, et al. PLoS One. 2013;8(3):e59823. doi: 10.1371/journal.pone.0059823. Epub 2013 Mar 22. PLoS One. 2013. PMID: 23533652 Free PMC article.
  • Genetic predisposition to mosaic Y chromosome loss in blood.
    Thompson DJ, Genovese G, Halvardson J, Ulirsch JC, Wright DJ, Terao C, Davidsson OB, Day FR, Sulem P, Jiang Y, Danielsson M, Davies H, Dennis J, Dunlop MG, Easton DF, Fisher VA, Zink F, Houlston RS, Ingelsson M, Kar S, Kerrison ND, Kinnersley B, Kristjansson RP, Law PJ, Li R, Loveday C, Mattisson J, McCarroll SA, Murakami Y, Murray A, Olszewski P, Rychlicka-Buniowska E, Scott RA, Thorsteinsdottir U, Tomlinson I, Moghadam BT, Turnbull C, Wareham NJ, Gudbjartsson DF; International Lung Cancer Consortium (INTEGRAL-ILCCO); Breast Cancer Association Consortium; Consortium of Investigators of Modifiers of BRCA1/2; Endometrial Cancer Association Consortium; Ovarian Cancer Association Consortium; Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) Consortium; Kidney Cancer GWAS Meta-Analysis Project; eQTLGen Consortium; Biobank-based Integrative Omics Study (BIOS) Consortium; 23andMe Research Team, Kamatani Y, Hoffmann ER, Jackson SP, Stefansson K, Auton A, Ong KK, Machiela MJ, Loh PR, Dumanski JP, Chanock SJ, Forsberg LA, Perry JRB. Thompson DJ, et al. Nature. 2019 Nov;575(7784):652-657. doi: 10.1038/s41586-019-1765-3. Epub 2019 Nov 20. Nature. 2019. PMID: 31748747 Free PMC article.
  • Biological implications of clonal hematopoiesis.
    Luis TC, Wilkinson AC, Beerman I, Jaiswal S, Shlush LI. Luis TC, et al. Exp Hematol. 2019 Sep;77:1-5. doi: 10.1016/j.exphem.2019.08.004. Epub 2019 Aug 28. Exp Hematol. 2019. PMID: 31472170 Review.
  • Recent advances in understanding clonal haematopoiesis in aplastic anaemia.
    Stanley N, Olson TS, Babushok DV. Stanley N, et al. Br J Haematol. 2017 May;177(4):509-525. doi: 10.1111/bjh.14510. Epub 2017 Jan 20. Br J Haematol. 2017. PMID: 28107566 Free PMC article. Review.
See all similar articles

Cited by 28 articles

See all "Cited by" articles


    1. Jacobs KB, et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nature Genetics. 2012;44:651–658. - PMC - PubMed
    1. Laurie CC, et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nature Genetics. 2012;44:642–650. - PMC - PubMed
    1. Genovese G, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. New England Journal of Medicine. 2014;371:2477–2487. - PMC - PubMed
    1. Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. New England Journal of Medicine. 2014;371:2488–2498. - PMC - PubMed
    1. Xie M, et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nature Medicine. 2014;20:1472–1478. - PMC - PubMed

Publication types

MeSH terms