Identification of members of gene families in Arabidopsis thaliana by contig construction from partial cDNA sequences: 106 genes encoding 50 cytoplasmic ribosomal proteins

Plant J. 1997 May;11(5):1127-40. doi: 10.1046/j.1365-313x.1997.11051127.x.


Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has led to the identification of tags to about 8,000 of the estimated 20,000 genes on Arabidopsis thaliana. This figure represents four to five times the number of complete coding sequences from this organism available in international databases. In contrast to mammals, many proteins are encoded by multigene families in A. thaliana. Using ribosomal protein gene families as an example, it is possible to construct relatively long sequences from overlapping ESTs which are of sufficiently high quality to be able to unambiguously identify tags to individual members of multigene families, even when the sequences are highly conserved. A total of 106 genes encoding 50 different cytoplasmic ribosomal protein types have been identified, most proteins being encoded by at least two and up to four genes. Coding sequences of members of individual gene families are almost always very highly conserved and derived amino acid sequences are almost, if not completely, identical in the vast majority of cases. Sequence divergence is observed in untranslated regions which allows the definition of gene-specific probes. The method can be used to construct high-quality tags to any protein.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Arabidopsis / genetics*
  • Base Sequence
  • Conserved Sequence
  • Cytoplasm
  • DNA, Complementary / genetics*
  • Databases, Factual
  • Gene Expression
  • Genes, Plant*
  • Molecular Sequence Data
  • Multigene Family*
  • Oligonucleotide Probes
  • Ribosomal Protein S6
  • Ribosomal Proteins / genetics*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid
  • Software


  • DNA, Complementary
  • Oligonucleotide Probes
  • Ribosomal Protein S6
  • Ribosomal Proteins
  • ribosomal protein L1
  • ribosomal protein S18

Associated data

  • GENBANK/A34571
  • GENBANK/A36571
  • GENBANK/B24028
  • GENBANK/C36571
  • GENBANK/D38010
  • GENBANK/L27461
  • GENBANK/L31645
  • GENBANK/M62396
  • GENBANK/S11393
  • GENBANK/S19164
  • GENBANK/S22789
  • GENBANK/S32578
  • GENBANK/S39486
  • GENBANK/S42260
  • GENBANK/S51347
  • GENBANK/U10046
  • GENBANK/U30454
  • GENBANK/U30495
  • GENBANK/X77456
  • GENBANK/X91958
  • GENBANK/X91959
  • SWISSPROT/P17094
  • SWISSPROT/P23358
  • SWISSPROT/P29766
  • SWISSPROT/P35685
  • SWISSPROT/P38666
  • SWISSPROT/P41099
  • SWISSPROT/P41127
  • SWISSPROT/P46286
  • SWISSPROT/Q07760