Evidence classification of high-throughput protocols and confidence integration in RegulonDB

Database (Oxford). 2013 Jan 17:2013:bas059. doi: 10.1093/database/bas059. Print 2013.

Abstract

RegulonDB provides curated information on the transcriptional regulatory network of Escherichia coli and contains both experimental data and computationally predicted objects. To account for the heterogeneity of these data, we introduced in version 6.0, a two-tier rating system for the strength of evidence, classifying evidence as either 'weak' or 'strong' (Gama-Castro,S., Jimenez-Jacinto,V., Peralta-Gil,M. et al. RegulonDB (Version 6.0): gene regulation model of Escherichia Coli K-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Res., 2008;36:D120-D124.). We now add to our classification scheme the classification of high-throughput evidence, including chromatin immunoprecipitation (ChIP) and RNA-seq technologies. To integrate these data into RegulonDB, we present two strategies for the evaluation of confidence, statistical validation and independent cross-validation. Statistical validation involves verification of ChIP data for transcription factor-binding sites, using tools for motif discovery and quality assessment of the discovered matrices. Independent cross-validation combines independent evidence with the intention to mutually exclude false positives. Both statistical validation and cross-validation allow to upgrade subsets of data that are supported by weak evidence to a higher confidence level. Likewise, cross-validation of strong confidence data extends our two-tier rating system to a three-tier system by introducing a third confidence score 'confirmed'. Database URL: http://regulondb.ccg.unam.mx/

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biosynthetic Pathways / genetics
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • Databases, Genetic*
  • Escherichia coli / genetics*
  • Gene Expression Regulation, Bacterial
  • Gene Regulatory Networks
  • Position-Specific Scoring Matrices
  • Regulon / genetics*
  • Reproducibility of Results
  • Statistics as Topic*
  • Transcription Initiation Site