Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat

Plant Mol Biol. 2002 Mar-Apr;48(5-6):501-10. doi: 10.1023/a:1014875206165.

Abstract

Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24 606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology
  • Databases, Factual
  • Expressed Sequence Tags*
  • Genetic Markers
  • Hordeum / genetics
  • Microsatellite Repeats / genetics*
  • Oryza / genetics
  • Poaceae / genetics*
  • Sequence Homology, Nucleic Acid
  • Triticum / genetics
  • Zea mays / genetics

Substances

  • Genetic Markers