Gene prediction and gene classes in Arabidopsis thaliana

J Biotechnol. 2000 Mar 31;78(3):293-9. doi: 10.1016/s0168-1656(00)00196-6.

Abstract

Gene prediction methods for eukaryotic genomes still are not fully satisfying. One way to improve gene prediction accuracy, proven to be relevant for prokaryotes, is to consider more than one model of genes. Thus, we used our classification of Arabidopsis thaliana genes in two classes (CU(1) and CU(2)), previously delineated according to statistical features, in the GeneMark gene identification program. For each gene class, as well as for the two classes combined, a Markov model was developed (respectively, GM-CU(1), GM-CU(2) and GM-all) and then used on a test set of 168 genes to compare their respective efficiency. We concluded from this analysis that GM-CU(1) is more sensitive than GM-CU(2) which seems to be more specific to a gene type. Besides, GM-all does not give better results than GM-CU(1) and combining results from GM-CU(1) and GM-CU(2) greatly improve prediction efficiency in comparison with predictions made with GM-all only. Thus, this work confirms the necessity to consider more than one gene model for gene prediction in eukaryotic genomes, and to look for gene classes in order to build these models.

MeSH terms

  • Arabidopsis / genetics*
  • Biotechnology
  • Codon / genetics
  • DNA, Plant / genetics
  • Databases, Factual
  • Exons
  • Genes, Plant*
  • Models, Genetic
  • Software

Substances

  • Codon
  • DNA, Plant