RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding

Comput Struct Biotechnol J. 2019 Nov 7;17:1415-1428. doi: 10.1016/j.csbj.2019.09.009. eCollection 2019.


Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby altering the transcriptional regulation of the target genes. Such regulatory SNPs have been implicated as causal variants in Genome-Wide Association Study (GWAS) studies. In this study, we describe improved versions of the programs Variation-tools designed to predict regulatory variants, and present four case studies to illustrate their usage and applications. In brief, Variation-tools facilitate i) obtaining variation information, ii) interconversion of variation file formats, iii) retrieval of sequences surrounding variants, and iv) calculating the change on predicted transcription factor affinity scores between alleles, using motif scanning approaches. Notably, the tools support the analysis of haplotypes. The tools are included within the well-maintained suite Regulatory Sequence Analysis Tools (RSAT,, and accessible through a web interface that currently enables analysis of five metazoa and ten plant genomes. Variation-tools can also be used in command-line with any locally-installed Ensembl genome. Users can input personal collections of variants and motifs, providing flexibility in the analysis.

Keywords: Binding motifs; CEU, Northern Europeans from Utah; CRM, Cis-Regulatory Module; GWAS, Genome Wide Association Studies; LD, Linkage Disequilibrium; MPRA, Massively Parallel Reporter Assays: MPRA; PSSM, Position Specific Scoring Matrix; Position specific scoring matrix; ROC, Receiver Operating Characteristic; RSAT, Regulatory Sequence Analysis Tools; Regulatory variants; SNP, Single Nucleotide Polymorphism; SNPs; SOIs, SNPs of Interest; TF, Transcription Factor; TFBS, Transcription Factor Binding Site; Transcription factors; eQTL, Expression Quantitative Trait Loci; rsID, Reference SNP Identifier.