Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 45, 203-26

Human Copy Number Variation and Complex Genetic Disease

Affiliations

Human Copy Number Variation and Complex Genetic Disease

Santhosh Girirajan et al. Annu Rev Genet.

Abstract

Copy number variants (CNVs) play an important role in human disease and population diversity. Advancements in technology have allowed for the analysis of CNVs in thousands of individuals with disease in addition to thousands of controls. These studies have identified rare CNVs associated with neuropsychiatric diseases such as autism, schizophrenia, and intellectual disability. In addition, copy number polymorphisms (CNPs) are present at higher frequencies in the population, show high diversity in copy number, sequence, and structure, and have been associated with multiple phenotypes, primarily related to immune or environmental response. However, the landscape of copy number variation still remains largely unexplored, especially for smaller CNVs and those embedded within complex regions of the human genome. An integrated approach including characterization of single nucleotide variants and CNVs in a large number of individuals with disease and normal genomes holds the promise of thoroughly elucidating the genetic basis of human disease and diversity.

Figures

Figure 1
Figure 1
Size and frequency of major categories of genetic variants. Different sized genetic variants as a function of frequency are shown. Overall, single nucleotide polymorphisms (SNPs) occur at a higher frequency and can be assayed by high throughput SNP genotyping. Copy number variants are intermediate-sized variants (>50 bp) and can be assayed by SNP microarrays or array comparative genomic hybridization. Large chromosomal aberrations are rarer, large, and microscopically visible after G-banding and are often associated with major congenital abnormalities (e.g., Down syndrome associated with trisomy 21).
Figure 2
Figure 2
Prevalence of genomic disorders in individuals with developmental delay. The prevalence of rare deletions and duplications associated with neurodevelopmental disorders, also termed genomic disorders, is shown. The data were generated from an analysis of 15,767 individuals assessed for developmental delay and associated phenotypes (20a). Note that the prevalence of the deletion is higher compared with reciprocal duplications for most of the disorders. Also note that nonrecurrent rearrangements are generally rarer in frequency compared with recurrent segmental duplication-mediated rearrangements. The candidate genes within each of these genomic disorder regions are depicted within the parenthesis.
Figure 3
Figure 3
Sequence identity and size of segmental duplications mediating disease-associated genomic rearrangements. Properties of the segmental duplications participating in nonallelic homologous recombination (NAHR) events are shown. Note that direct orientation of segmental duplications is also a general requirement and individuals possessing such architecture are predisposed for an NAHR event. Operationally, segmental duplications are defined as large blocks with greater than 10 kbp of repeat sequences with >95% sequence identity.
Figure 4
Figure 4
Methods for associating rare copy number variants (CNVs) to neurodevelopmental disease. (a) Pathogenicity has been classically associated with a de novo or new mutation model. Pathogenic variants are expected to be strongly selected and the prevalence of these CNVs is essentially maintained by de novo occurrence. (b) Case-control association study to infer pathogenicity for a CNV. Locus-specific CNV frequency is compared in cases and controls under the assumption that the pathogenic CNV is enriched in cases that manifest the disease. This comparison is only valid when both the cohorts are matched for age, sex, and ethnicity and assayed on the comparable CNV detection platform. (c) Sliding window or segment-based approach to identify pathogenic regions in the genome. Such analysis can identify a specific genic region or a locus enriched in cases compared with controls. (d) Size-wise comparison of CNV data as a function of frequency is a good estimate of selective pressure on CNVs. This method provides an estimate of the odds ratio for a particular sized variant. (e) Pathway-based analysis for assessing pathogenicity of the individually rare but collectively common CNVs. This model is generally applicable in the study of complex neuropsychiatric disease wherein related genes are thought to interact in a common neurological pathway. An altered homeostatic state resulting in disease is inferred when two or more genes within the same pathway are disrupted. (f) The global CNV rate and gene disruptions as a function of pathogenic association. The total number of rare CNVs and the number of genes disrupted by deletion or duplication can also be considered for testing pathogenicity. Such a method was recently utilized by Pinto and colleagues in a large-scale study of individuals with autism (98).
Figure 5
Figure 5
Methods for associating copy number polymorphisms (CNPs) to disease. (a) For a CNP with discrete copy number genotypes, the counts of each copy number genotype can be compared across cases and controls. (b) For a CNP where discrete copy number genotypes cannot be assigned, the distribution of copy numbers can be compared between cases and controls. (c) CNPs associated to disease can be identified indirectly through the association of a single nucleotide polymorphism in linkage disequilibrium.
Figure 6
Figure 6
The importance of assessing copy number and sequence content. A simplified diagram of the survival of motor neuron (SMN) locus is depicted based on the reference genome assembly; however, it is known that this region is highly variable and can exist in multiple configurations (103). Copy number variation at this locus is likely mediated by many pairs of paralogous segmental duplications (arrows). Spinal muscular atrophy is caused by homozygous deletion of SMN1. The phenotypic effects of this rare copy number variant are modified by a copy number polymorphism encompassing SMN2, a highly identical paralog of SMN1, which differs from SMN1 by a splice site variant that leads to skipping of exon 7. Different structures of the SMN locus are diagrammed with the expected result of hybridizing individuals with these structures against an individual with four total SMN copies (the reference assembly configuration). Note that array CGH can be used to estimate total copy of the SMN genes, and the sequence content of these copies is critical for phenotypic outcome.

Similar articles

See all similar articles

Cited by 126 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback