Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;36(Database issue):D107-13.
doi: 10.1093/nar/gkm967. Epub 2007 Nov 15.

ORegAnno: An Open-Access Community-Driven Resource for Regulatory Annotation

Affiliations
Free PMC article

ORegAnno: An Open-Access Community-Driven Resource for Regulatory Annotation

Obi L Griffith et al. Nucleic Acids Res. .
Free PMC article

Abstract

ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.

Figures

Figure 1.
Figure 1.
Information flow for ORegAnno annotation process. (A) Data input. A publication queue allows papers from scientific literature to be added to the system for future curation. Users in the gene regulation community can enter or ‘check out’ papers from the queue for detailed manual curation using a series of user-friendly annotation pages. It is also possible to ‘batch upload’ complete datasets (e.g. external databases) using the ORegAnno XML data exchange format. (B) Data storage and processing. All functionality of the ORegAnno web application depends on storage and retrieval of data from an underlying MySQL relational database. Records are cross-referenced to PubMed, Entrez, Ensembl, dbSNP and eVOC where appropriate. A BLAST-based mapping agent assigns genome coordinates to each sequence. (C) Visualization. All mapped ORegAnno records can be viewed as custom tracks in the Ensembl or UCSC genome browsers. Most records are also available as official tracks in UCSC. (D) Data access. The web application provides an advanced search page for the entire record set. Each record page represents a complete summary of the data for a verified regulatory sequence. Nightly data dumps are posted in XML format. Programmatic interaction with ORegAnno is available through web services using the Perl SOAP modules.

Similar articles

See all similar articles

Cited by 131 articles

See all "Cited by" articles

References

    1. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 2004;5:276–287. - PubMed
    1. Elnitski L, Jin VX, Farnham PJ, Jones SJ. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 2006;16:1455–1464. - PubMed
    1. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005;23:137–144. - PubMed
    1. Blanco E, Farre D, Alba MM, Messeguer X, Guigo R. ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. Nucleic Acids Res. 2006;34:D63–D67. - PMC - PubMed
    1. Vlieghe D, Sandelin A, De Bleser PJ, Vleminckx K, Wasserman WW, van Roy F, Lenhard B. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 2006;34:D95–D97. - PMC - PubMed

Publication types

Substances

Feedback