Genome information management and integrated data analysis with HaloLex

Friedhelm Pfeiffer; Alexander Broicher; Thomas Gillich; Kathrin Klee; José Mejía; Markus Rampp; Dieter Oesterhelt

doi:10.1007/s00203-008-0389-z

Genome information management and integrated data analysis with HaloLex

Arch Microbiol. 2008 Sep;190(3):281-99. doi: 10.1007/s00203-008-0389-z. Epub 2008 Jul 1.

Authors

Friedhelm Pfeiffer¹, Alexander Broicher, Thomas Gillich, Kathrin Klee, José Mejía, Markus Rampp, Dieter Oesterhelt

Affiliation

¹ Department of Membrane Biochemistry, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany.

Abstract

HaloLex is a software system for the central management, integration, curation, and web-based visualization of genomic and other -omics data for any given microorganism. The system has been employed for the manual curation of three haloarchaeal genomes, namely Halobacterium salinarum (strain R1), Natronomonas pharaonis, and Haloquadratum walsbyi. HaloLex, in particular, enables the integrated analysis of genome-wide proteomic results with the underlying genomic data. This has proven indispensable to generate reliable gene predictions for GC-rich genomes, which, due to their characteristically low abundance of stop codons, are known to be hard targets for standard gene finders, especially concerning start codon assignment. The proteomic identification of more than 600 N-terminal peptides has greatly increased the reliability of the start codon assignment for Halobacterium salinarum. Application of homology-based methods to the published genome of Haloarcula marismortui allowed to detect 47 previously unidentified genes (a problem that is particularly serious for short protein sequences) and to correct more than 300 start codon misassignments.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Archaeal Proteins / genetics
Codon, Initiator / genetics
Computational Biology / methods
Genes, Archaeal
Genome, Archaeal*
Genomics
Halobacteriaceae / genetics*
Information Management
Molecular Sequence Data
Open Reading Frames
Proteomics
Sequence Alignment
Sequence Homology, Amino Acid
Software*

Substances

Archaeal Proteins
Codon, Initiator