The majority of single nucleotide variants (SNVs) identified in Genome Wide Association Studies (GWAS) fall within non-protein coding DNA and have the potential to alter gene expression. Non-protein coding DNA can control gene expression by acting as transcription factor (TF) binding sites or by regulating the organization of DNA into chromatin. SNVs in non-coding DNA sequences can disrupt TF binding and chromatin structure and this can result in pathology. Further, environmental health studies have shown that exposure to xenobiotics can disrupt the ability of TFs to regulate entire gene networks and result in pathology. However, there is a large amount of interindividual variability in exposure-linked health outcomes. One explanation for this heterogeneity is that genetic variation and exposure combine to disrupt gene regulation, and this eventually manifests in disease. Many resources exist that annotate common variants from GWAS and combine them with conservation, functional genomics, and TF binding data. These annotation tools provide clues regarding the biological implications of an SNV, as well as lead to the generation of hypotheses regarding potentially disrupted target genes, epigenetic markers, pathways, and cell types. Collectively this information can be used to predict how SNVs can alter an individual's response to exposure and disease risk. A basic understanding of the regulatory information contained within non-protein coding DNA is needed to predict the biological consequences of SNVs, and to determine how these SNVs impact exposure-related disease. We hope that this review will aid in the characterization of disease-associated genetic variation in the non-protein coding genome.
Keywords: Gene-environment interactions; Gene-expression; Gene-regulation; Non-protein coding DNA.
Copyright © 2020 Elsevier B.V. All rights reserved.