Relative codon adaptation: a generic codon bias index for prediction of gene expression

Jesse M Fox; Ivan Erill

doi:10.1093/dnares/dsq012

Relative codon adaptation: a generic codon bias index for prediction of gene expression

DNA Res. 2010 Jun;17(3):185-96. doi: 10.1093/dnares/dsq012. Epub 2010 May 7.

Authors

Jesse M Fox¹, Ivan Erill

Affiliation

¹ Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA.

Abstract

The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bacteria / genetics*
Base Composition
Biomarkers / metabolism
Codon*
Computational Biology
Gene Expression Profiling
Gene Expression Regulation, Bacterial
Gene Expression*
Genes, Bacterial*
Genome, Bacterial
Oligonucleotide Array Sequence Analysis

Substances

Biomarkers
Codon