DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants

Nucleic Acids Res. 2018 Jun 20;46(11):e69. doi: 10.1093/nar/gky215.

Abstract

The complex system of gene expression is regulated by the cell type-specific binding of transcription factors (TFs) to regulatory elements. Identifying variants that disrupt TF binding and lead to human diseases remains a great challenge. To address this, we implement sequence-based deep learning models that accurately predict the TF binding intensities to given DNA sequences. In addition to accurately classifying TF-DNA binding or unbinding, our models are capable of accurately predicting real-valued TF binding intensities by leveraging large-scale TF ChIP-seq data. The changes in the TF binding intensities between the altered sequence and the reference sequence reflect the degree of functional impact for the variant. This enables us to develop the tool DeFine (Deep learning based Functional impact of non-coding variants evaluator, http://define.cbi.pku.edu.cn) with improved performance for assessing the functional impact of non-coding variants including SNPs and indels. DeFine accurately identifies the causal functional non-coding variants from disease-associated variants in GWAS. DeFine is an effective and easy-to-use tool that facilities systematic prioritization of functional non-coding variants.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Binding Sites / genetics
  • Computational Biology / methods*
  • DNA / genetics
  • DNA / metabolism*
  • DNA-Binding Proteins / genetics
  • Gene Expression Regulation / genetics*
  • Humans
  • Neural Networks, Computer*
  • Regulatory Elements, Transcriptional / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Transcription Factors / metabolism*

Substances

  • DNA-Binding Proteins
  • Transcription Factors
  • DNA