Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 4;44(D1):D116-25.
doi: 10.1093/nar/gkv1249. Epub 2015 Nov 19.

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

Affiliations

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

Ivan V Kulakovskiy et al. Nucleic Acids Res. .

Abstract

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the HOCOMOCO update workflow.
Figure 2.
Figure 2.
Results of the benchmark assessing performance of different model collections on human and mouse ChIP-Seq data. The full height of each bar depicts the total number of assessed TFs for a particular case. The green fraction of a bar depicts the number of TFs for which a model from the given collection was the best (had the highest wAUC). The white fraction of the mono-HOCOMOCO v10 bar consists of TFs with the best models found in the databases that did not participate in the collection assembly (HOMER, SWISSREGULON, JASPAR and the published version of HT-SELEX).
Figure 3.
Figure 3.
Coverage of TF structural families by TFBS models of HOCOMOCO v10. The area of each blue circle is proportional to the total number of members of a particular family; the orange smaller circle depicts the fraction of TFs for which TFBS models are available. The TF classification is given according to TFClass.
Figure 4.
Figure 4.
Binding models of human and mouse STAT1 TFs. LOGO representations of selected models learned from different ChIP-Seq datasets are shown. wAUC values of different models within species are extremely close (about 0.89 for human and 0.78 for mouse). HOCOMOCO v9 human model is shown as the reference. One of human ChIP-Seq datasets yielded a mouse-like model subtype.

Similar articles

Cited by

References

    1. Ravasi T., Suzuki H., Cannistraci C.V., Katayama S., Bajic V.B., Tan K., Akalin A., Schmeier S., Kanamori-Katayama M., Bertin N., et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell. 2010;140:744–752. - PMC - PubMed
    1. Melton C., Reuter J.A., Spacek D.V., Snyder M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 2015;47:710–716. - PMC - PubMed
    1. Kamanu F.K., Medvedeva Y.A., Schaefer U., Jankovic B.R., Archer J.A.C., Bajic V.B. Mutations and binding sites of human transcription factors. Front. Genet. 2012;3:100. - PMC - PubMed
    1. Kazemian M., Pham H., Wolfe S.A., Brodsky M.H., Sinha S. Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res. 2013;41:8237–8252. - PMC - PubMed
    1. Stormo G.D. Introduction to Protein-DNA Interactions: Structure, Thermodynamics, and Bioinformatics. 1st edn. NY: Cold Spring Harbor Laboratory Press; 2013.

Publication types

Substances