Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat

Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells

doi:10.1023/a:1014875206165

Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat

Plant Mol Biol. 2002 Mar-Apr;48(5-6):501-10. doi: 10.1023/a:1014875206165.

Authors

Ramesh V Kantety¹, Mauricio La Rota, David E Matthews, Mark E Sorrells

Affiliation

¹ Department of Plant Breeding, Cornell University, Ithaca, NY 14853, USA.

PMID: 11999831
DOI: 10.1023/a:1014875206165

Abstract

Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24 606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Computational Biology
Databases, Factual
Expressed Sequence Tags*
Genetic Markers
Hordeum / genetics
Microsatellite Repeats / genetics*
Oryza / genetics
Poaceae / genetics*
Sequence Homology, Nucleic Acid
Triticum / genetics
Zea mays / genetics

Substances

Genetic Markers