The mechanisms underlying de novo insertion/deletion (indel) genesis, such as polymerase slippage, have been hypothesized but not well characterized in the human genome. We implemented two methodological improvements, which were leveraged to dissect indel mutagenesis. We assigned de novo variants to parent-of-origin (i.e., phasing) with low-coverage long-read whole-genome sequencing, achieving better phasing compared to short-read sequencing (medians of 84% and 23%, respectively). We then wrote an application programming interface to classify indels into three subtypes according to sequence context. Across three cohorts with different phasing methods (Ntrios = 540, all cohorts), we observed that one de novo indel subtype, change in copy count (CCC), was significantly correlated with father's (p = 7.1 × 10-4 ) but not mother's (p = .45) age at conception. We replicated this effect in three cohorts without de novo phasing (ppaternal = 1.9 × 10-9 , pmaternal = .61; Ntrios = 3,391, all cohorts). Although this is consistent with polymerase slippage during spermatogenesis, the percentage of variance explained by paternal age was low, and we did not observe an association with replication timing. These results suggest that spermatogenesis-specific events have a minor role in CCC indel mutagenesis, one not observed for other indel subtypes nor for maternal age in general. These results have implications for indel modeling in evolution and disease.
Keywords: de novo variants; indels; long-read technology; parent-of-origin phasing; whole-genome sequencing.
© 2020 Wiley Periodicals, Inc.