Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 18;10(3):e0119420.
doi: 10.1371/journal.pone.0119420. eCollection 2015.

In-silico analysis of inflammatory bowel disease (IBD) GWAS loci to novel connections

Affiliations

In-silico analysis of inflammatory bowel disease (IBD) GWAS loci to novel connections

Md Mesbah-Uddin et al. PLoS One. .

Abstract

Genome-wide association studies (GWASs) for many complex diseases, including inflammatory bowel disease (IBD), produced hundreds of disease-associated loci-the majority of which are noncoding. The number of GWAS loci is increasing very rapidly, but the process of translating single nucleotide polymorphisms (SNPs) from these loci to genomic medicine is lagging. In this study, we investigated 4,734 variants from 152 IBD associated GWAS loci (IBD associated 152 lead noncoding SNPs identified from pooled GWAS results + 4,582 variants in strong linkage-disequilibrium (LD) (r2 ≥0.8) for EUR population of 1K Genomes Project) using four publicly available bioinformatics tools, e.g. dbPSHP, CADD, GWAVA, and RegulomeDB, to annotate and prioritize putative regulatory variants. Of the 152 lead noncoding SNPs, around 11% are under strong negative selection (GERP++ RS ≥2); and ~30% are under balancing selection (Tajima's D score >2) in CEU population (1K Genomes Project)--though these regions are positively selected (GERP++ RS <0) in mammalian evolution. The analysis of 4,734 variants using three integrative annotation tools produced 929 putative functional SNPs, of which 18 SNPs (from 15 GWAS loci) are in concordance with all three classifiers. These prioritized noncoding SNPs may contribute to IBD pathogenesis by dysregulating the expression of nearby genes. This study showed the usefulness of integrative annotation for prioritizing fewer functional variants from a large number of GWAS markers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Selective constraints on IBD associated lead noncoding SNPs.
(a) Evolutionary conservation versus human specific selection on lead SNPs. Evolutionary conservation is presented by GERP++ RS scores (x axis) and human specific selection is presented by Tajima’s D scores (y axis). Each point represents a single SNP; n = 148 SNPs. Positive GERP++ and TD scores indicate regions under negative selection and negative scores indicate positive selection. SNPs in the top-right segment are under strong evolutionary constraint as well as human specific selection. (b) Cross-population diversity of lead SNPs between CEU population and rest of the world. DAF is presented in the x axis and ΔDAF is presented in y axis. Each point represents a single SNP; n = 148 SNPs. GERP++ RS: Genomic Evolutionary Rate Profiling rejected substitution; TD: Tajima’s D; DAF: Derived allele frequency; ΔDAF: Difference in derived allele frequency; CEU: Utah residents with ancestry from Northern and Western Europe. TD, DAF and ΔDAF statistics are for 1K Genomes Project’s CEU population.
Fig 2
Fig 2. Distribution of (a) CADD and (b) GWAVA scores.
Histograms are drawn taking CADD and GWAVA scores of all the variants (lead GWAS SNPs and LD variants) after removing the missing values. Here n = 4781 for CADD and n = 4460 variants for GWAVA.
Fig 3
Fig 3. Comparing functional annotation scores of IBD associated lead noncoding GWAS SNPs versus LD variants.
(a) CADD scores for GWAS (n = 152) versus LD variants (n = 4627). (b) GWAVA scores for GWAS (n = 152) versus LD variants (n = 4308). In the boxplots, center lines show the medians of the values and box limits indicate the 25th & 75th percentiles (as determined by R software). Whiskers extend to 5th and 95th percentiles and outliers are represented by open circle dots. The notches are defined as ± 1.58 × IQR (interquartile range) / square root of n and represent the 95% confidence interval for each median (default in R software). P-values are calculated using two-sided Wilcoxon rank sum test.
Fig 4
Fig 4. Bar diagram representing the number of SNPs in each RegulomeDB category.
Here n = 4734 variants. ND: No Data.
Fig 5
Fig 5. Concordance Analysis.
(a) Scatterplot depicting top annotation scores from CADD, GWAVA and RegulomeDB. Each point represents a single SNP where CADD scores are in the x axis, GWAVA scores in the y axis and RegulomeDB annotation scores are shown with colors. The plot is divided into four segments using y intercept = 0.4 and x intercept = 10. Top right segment contains 18 SNPs with highest annotation scores from the three classifiers. (b) Venn diagram illustrating the number of SNPs in concordance with the three classifiers. Here CADD scores are in blue, GWAVA scores are in red and RegulomeDB scores are in green. The intersection of three circles represents SNPs in concordance with all three classifiers. Note: Cutoff scores used for concordance analysis are: CADD ≥10, GWAVA ≥0.4 and RegulomeDB ≤2. Based on this cutoff, 929 SNPs are plotted in the scatterplot and Venn diagram (after removing the overlap). RegDB: RegulomeDB annotation score; LLAB: Less likely to affect binding (includes category 3a and 3b); MBE: Minimal binding evidence (includes category 4, 5 and 6); ND: No Data.
Fig 6
Fig 6. Gene interaction network using GeneMANIA webserver.
Here genes are represented as nodes and edges indicate different types of interaction between genes. Black circles are the query genes and the color coding on edges indicate different types of interaction—which is defined in the network legend.
Fig 7
Fig 7. Top MCODE clusters from the gene interaction network.
Highly connected gene clusters were identified from the network using MCODE v1.32 (a Cytoscape 2.8.2 plugin) and the top 11 clusters are presented (cutoff score ≥1).

Similar articles

Cited by

References

    1. Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491(7422):119–24. 10.1038/nature11582 - DOI - PMC - PubMed
    1. Nguyen GC, Chong CA, Chong RY. National estimates of the burden of inflammatory bowel disease among racial and ethnic groups in the United States. J Crohns Colitis. 2014;8(4):288–95. 10.1016/j.crohns.2013.09.001 - DOI - PubMed
    1. Molodecky NA, Soon IS, Rabi DM, Ghali WA, Ferris M, Chernoff G, et al. Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic review. Gastroenterology. 2012;142(1):46–54 e42; quiz e30. 10.1053/j.gastro.2011.10.001 - DOI - PubMed
    1. Ng SC, Bernstein CN, Vatn MH, Lakatos PL, Loftus EV Jr, Tysk C, et al. Geographical variability and environmental risk factors in inflammatory bowel disease. Gut. 2013;62(4):630–49. 10.1136/gutjnl-2012-303661 - DOI - PubMed
    1. Prideaux L, Kamm MA, De Cruz PP, Chan FK, Ng SC. Inflammatory bowel disease in Asia: a systematic review. Journal of gastroenterology and hepatology. 2012;27(8):1266–80. 10.1111/j.1440-1746.2012.07150.x - DOI - PubMed

MeSH terms

Grants and funding

The authors have no support or funding to report.