Genome information management and integrated data analysis with HaloLex

Arch Microbiol. 2008 Sep;190(3):281-99. doi: 10.1007/s00203-008-0389-z. Epub 2008 Jul 1.

Abstract

HaloLex is a software system for the central management, integration, curation, and web-based visualization of genomic and other -omics data for any given microorganism. The system has been employed for the manual curation of three haloarchaeal genomes, namely Halobacterium salinarum (strain R1), Natronomonas pharaonis, and Haloquadratum walsbyi. HaloLex, in particular, enables the integrated analysis of genome-wide proteomic results with the underlying genomic data. This has proven indispensable to generate reliable gene predictions for GC-rich genomes, which, due to their characteristically low abundance of stop codons, are known to be hard targets for standard gene finders, especially concerning start codon assignment. The proteomic identification of more than 600 N-terminal peptides has greatly increased the reliability of the start codon assignment for Halobacterium salinarum. Application of homology-based methods to the published genome of Haloarcula marismortui allowed to detect 47 previously unidentified genes (a problem that is particularly serious for short protein sequences) and to correct more than 300 start codon misassignments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Archaeal Proteins / genetics
  • Codon, Initiator / genetics
  • Computational Biology / methods
  • Genes, Archaeal
  • Genome, Archaeal*
  • Genomics
  • Halobacteriaceae / genetics*
  • Information Management
  • Molecular Sequence Data
  • Open Reading Frames
  • Proteomics
  • Sequence Alignment
  • Sequence Homology, Amino Acid
  • Software*

Substances

  • Archaeal Proteins
  • Codon, Initiator