RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12

Víctor H Tierrafría; Claire Rioualen; Heladia Salgado; Paloma Lara; Socorro Gama-Castro; Patrick Lally; Laura Gómez-Romero; Pablo Peña-Loredo; Andrés G López-Almazo; Gabriel Alarcón-Carranza; Felipe Betancourt-Figueroa; Shirley Alquicira-Hernández; J Enrique Polanco-Morelos; Jair García-Sotelo; Estefani Gaytan-Nuñez; Carlos-Francisco Méndez-Cruz; Luis J Muñiz; César Bonavides-Martínez; Gabriel Moreno-Hagelsieb; James E Galagan; Joseph T Wade; Julio Collado-Vides

doi:10.1099/mgen.0.000833

RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12

Microb Genom. 2022 May;8(5):mgen000833. doi: 10.1099/mgen.0.000833.

Authors

Víctor H Tierrafría^{1

2}, Claire Rioualen¹, Heladia Salgado¹, Paloma Lara¹, Socorro Gama-Castro¹, Patrick Lally², Laura Gómez-Romero³, Pablo Peña-Loredo¹, Andrés G López-Almazo¹, Gabriel Alarcón-Carranza¹, Felipe Betancourt-Figueroa¹, Shirley Alquicira-Hernández¹, J Enrique Polanco-Morelos¹, Jair García-Sotelo⁴, Estefani Gaytan-Nuñez¹, Carlos-Francisco Méndez-Cruz¹, Luis J Muñiz¹, César Bonavides-Martínez¹, Gabriel Moreno-Hagelsieb⁵, James E Galagan², Joseph T Wade^{6

7}, Julio Collado-Vides^{1

2

8}

Affiliations

¹ Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico.
² Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA.
³ Instituto Nacional de Medicina Genómica, INMEGEN, Periférico Sur 4809, Arenal Tepepan, Tlalpan 14610, CDMX, Mexico.
⁴ Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro 76230, Querétaro, Mexico.
⁵ Department of Biology, Wilfrid Laurier University, 75 University Ave W, Waterloo, ON N2L 3C5, Canada.
⁶ Wadsworth Center, New York State Department of Health, Albany, NY, USA.
⁷ Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA.
⁸ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra(UPF), Barcelona, Spain.

Abstract

Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.

Keywords: ChIP-exo; ChIP-seq; DAP-seq; Escherichia coli K-12; High-Throughput Nucleotide Sequencing; RNA-seq; Transcriptional Regulatory Network; gSELEX.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Escherichia coli K12* / genetics
Escherichia coli K12* / metabolism
Escherichia coli* / genetics
Gene Expression Regulation, Bacterial
Operon / genetics
Reproducibility of Results

Grants and funding

R01 GM131643/GM/NIGMS NIH HHS/United States