Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 4;103(4):522-534.
doi: 10.1016/j.ajhg.2018.08.016. Epub 2018 Sep 27.

Burden Testing of Rare Variants Identified Through Exome Sequencing via Publicly Available Control Data

Free PMC article

Burden Testing of Rare Variants Identified Through Exome Sequencing via Publicly Available Control Data

Michael H Guo et al. Am J Hum Genet. .
Free PMC article


The genetic causes of many Mendelian disorders remain undefined. Factors such as lack of large multiplex families, locus heterogeneity, and incomplete penetrance hamper these efforts for many disorders. Previous work suggests that gene-based burden testing-where the aggregate burden of rare, protein-altering variants in each gene is compared between case and control subjects-might overcome some of these limitations. The increasing availability of large-scale public sequencing databases such as Genome Aggregation Database (gnomAD) can enable burden testing using these databases as controls, obviating the need for additional control sequencing for each study. However, there exist various challenges with using public databases as controls, including lack of individual-level data, differences in ancestry, and differences in sequencing platforms and data processing. To illustrate the approach of using public data as controls, we analyzed whole-exome sequencing data from 393 individuals with idiopathic hypogonadotropic hypogonadism (IHH), a rare disorder with significant locus heterogeneity and incomplete penetrance against control subjects from gnomAD (n = 123,136). We leveraged presumably benign synonymous variants to calibrate our approach. Through iterative analyses, we systematically addressed and overcame various sources of artifact that can arise when using public control data. In particular, we introduce an approach for highly adaptable variant quality filtering that leads to well-calibrated results. Our approach "re-discovered" genes previously implicated in IHH (FGFR1, TACR3, GNRHR). Furthermore, we identified a significant burden in TYRO3, a gene implicated in hypogonadotropic hypogonadism in mice. Finally, we developed a user-friendly software package TRAPD (Test Rare vAriants with Public Data) for performing gene-based burden testing against public databases.

Keywords: TRAPD; gene-based burden analysis; hypogonadotropic hypogonadism.


Figure 1
Figure 1
Burden Testing Scheme Case cohort sequencing (IHH) and control database sequencing (gnomAD) data are processed separately, and burden testing is performed in the final step. For each set of data, sequencing quality filters, predicted variant pathogenicity filters, and sample filters (e.g., ancestry) can be applied. Then, counts of qualifying variant carriers for each gene in the case and control subjects are generated. Finally, burden testing is performed.
Figure 2
Figure 2
Effect of Coverage on Distribution of Synonymous Variants (A) Quantile-quantile plot of initial burden testing results using synonymous SNVs. Synonymous variants were used as they are likely mostly benign and can be used to test the null distribution. The x axis represents the expected –log10(p value) under the uniform distribution of p values. The y axis shows the observed –log10(p value) from the burden testing data. Each point is a single gene. Red dots represent the 35 genes previously implicated in IHH, while black dots represent the remaining genes in the genome. The black solid line shows the relationship between expected and observed p values under the uniform p value distribution. The dotted blue line shows the observed fit line between the 50th and 95th percentile of genes; the slope of this line is λΔ95. (B) Coverage at HRNR in case sequencing data and gnomAD control database. Exons are shown in yellow boxes below the plot, with wider boxes representing coding regions and narrower boxes representing UTRs. Introns (not drawn to scale) are shown as connecting lines between exons. Red dots represent coverage (as proportion of individuals with read depth >10×) in case cohort sequencing, while blue dots represent coverage in gnomAD control database. Each dot represents a single base. The dashed line represents the threshold for 90% of samples having sequencing read depth >10×. (C) Repeat QQ plot from (A), except considering only bases for which more than 90% of samples had sequencing read depth >10× in both gnomAD and case sequencing data.
Figure 3
Figure 3
Effect of Variant Quality Filters on Distribution of Synonymous Variants (A) Effect of adding pass/fail filters for variant quality. QQ plot of burden testing results following filtering for sites that passed GATK quality filters in the case and control sequencing data. (B) Burden testing using QD scores to filter for sites. Only top 95% of sites in gnomAD based on QD scores and top 85% of sites in the case cohort sequencing based on QD scores are used. Only sites where more than 90% of samples had sequencing read depth >10× in both gnomAD and the case cohort sequencing were considered (same as Figure 2B). QQ plots show burden testing results for synonymous variants.
Figure 4
Figure 4
Selection of Damaging Variants to Improve the Power of Rare Variant Burden Testing (A) Burden testing using all protein-altering variants. (B) Distribution of PolyPhen2 (PP2), SIFT, and CADD scores among missense variants observed in IHH-affected case subjects as compared to gnomAD. (C) Burden testing using only PTVs (essential splice site, frameshift, and nonsense) and missense variants computationally predicted to be damaging are considered. (D) Burden testing using only PTVs. For (A), (C), and (D), the same filters for coverage as in Figure 2B and variant quality as in Figure 3B were applied.
Figure 5
Figure 5
Addition of Indels to Rare Variant Burden Testing For case cohort sequencing, SNVs in the top 85% of QD scores and indels in the top 75% were considered. For gnomAD, SNVs in the top 95% of QD scores and indels in the top 85% were considered. QQ plot shows burden testing using all nonsynonymous variants (A), PTVs (splice site, frameshift, and nonsense) plus missense variants computationally predicted to be damaging (B), or PTVs only (C).

Similar articles

See all similar articles

Cited by 9 articles

  • CYLD is a causative gene for frontotemporal dementia - amyotrophic lateral sclerosis.
    Dobson-Stone C, Hallupp M, Shahheydari H, Ragagnin AMG, Chatterton Z, Carew-Jones F, Shepherd CE, Stefen H, Paric E, Fath T, Thompson EM, Blumbergs P, Short CL, Field CD, Panegyres PK, Hecker J, Nicholson G, Shaw AD, Fullerton JM, Luty AA, Schofield PR, Brooks WS, Rajan N, Bennett MF, Bahlo M, Landers JE, Piguet O, Hodges JR, Halliday GM, Topp SD, Smith BN, Shaw CE, McCann E, Fifita JA, Williams KL, Atkin JD, Blair IP, Kwok JB. Dobson-Stone C, et al. Brain. 2020 Mar 1;143(3):783-799. doi: 10.1093/brain/awaa039. Brain. 2020. PMID: 32185393
  • Reevaluating the Genetic Contribution of Monogenic Dilated Cardiomyopathy.
    Mazzarotto F, Tayal U, Buchan RJ, Midwinter W, Wilk A, Whiffin N, Govind R, Mazaika E, de Marvao A, Dawes TJW, Felkin LE, Ahmad M, Theotokis PI, Edwards E, Ing AY, Thomson KL, Chan LLH, Sim D, Baksi AJ, Pantazis A, Roberts AM, Watkins H, Funke B, O'Regan DP, Olivotto I, Barton PJR, Prasad SK, Cook SA, Ware JS, Walsh R. Mazzarotto F, et al. Circulation. 2020 Feb 4;141(5):387-398. doi: 10.1161/CIRCULATIONAHA.119.037661. Epub 2020 Jan 27. Circulation. 2020. PMID: 31983221 Free PMC article.
  • Neuron-Derived Neurotrophic Factor Is Mutated in Congenital Hypogonadotropic Hypogonadism.
    Messina A, Pulli K, Santini S, Acierno J, Känsäkoski J, Cassatella D, Xu C, Casoni F, Malone SA, Ternier G, Conte D, Sidis Y, Tommiska J, Vaaralahti K, Dwyer A, Gothilf Y, Merlo GR, Santoni F, Niederländer NJ, Giacobini P, Raivio T, Pitteloud N. Messina A, et al. Am J Hum Genet. 2020 Jan 2;106(1):58-70. doi: 10.1016/j.ajhg.2019.12.003. Epub 2019 Dec 26. Am J Hum Genet. 2020. PMID: 31883645
  • Gli3 Regulates Vomeronasal Neurogenesis, Olfactory Ensheathing Cell Formation, and GnRH-1 Neuronal Migration.
    Taroc EZM, Naik AS, Lin JM, Peterson NB, Keefe DL Jr, Genis E, Fuchs G, Balasubramanian R, Forni PE. Taroc EZM, et al. J Neurosci. 2020 Jan 8;40(2):311-326. doi: 10.1523/JNEUROSCI.1977-19.2019. Epub 2019 Nov 25. J Neurosci. 2020. PMID: 31767679 Free PMC article.
  • The Genetic Basis of Delayed Puberty.
    Howard SR. Howard SR. Front Endocrinol (Lausanne). 2019 Jun 26;10:423. doi: 10.3389/fendo.2019.00423. eCollection 2019. Front Endocrinol (Lausanne). 2019. PMID: 31293522 Free PMC article. Review.
See all "Cited by" articles

Publication types

Supplementary concepts

LinkOut - more resources