Computational finishing of large sequence contigs reveals interspersed nested repeats and gene islands in the rf1-associated region of maize

Plant Physiol. 2009 Oct;151(2):483-95. doi: 10.1104/pp.109.143370. Epub 2009 Aug 12.

Abstract

The architecture of grass genomes varies on multiple levels. Large long terminal repeat retrotransposon clusters occupy significant portions of the intergenic regions, and islands of protein-encoding genes are interspersed among the repeat clusters. Hence, advanced assembly techniques are required to obtain completely finished genomes as well as to investigate gene and transposable element distributions. To characterize the organization and distribution of repeat clusters and gene islands across large grass genomes, we present 961- and 594-kb contiguous sequence contigs associated with the rf1 (for restorer of fertility1) locus in the near-centromeric region of maize (Zea mays) chromosome 3. We present two methods for computational finishing of highly repetitive bacterial artificial chromosome clones that have proved successful to close all sequence gaps caused by transposable element insertions. Sixteen repeat clusters were observed, ranging in length from 23 to 155 kb. These repeat clusters are almost exclusively long terminal repeat retrotransposons, of which the paleontology of insertion varies throughout the cluster. Gene islands contain from one to four predicted genes, resulting in a gene density of one gene per 16 kb in gene islands and one gene per 111 kb over the entire sequenced region. The two sequence contigs, when compared with the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes, retain gene colinearity of 50% and 71%, respectively, and 70% and 100%, respectively, for high-confidence gene models. Collinear genes on single gene islands show that while most expansion of the maize genome has occurred in the repeat clusters, gene islands are not immune and have experienced growth in both intragene and intergene locations.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromosomes, Artificial, Bacterial / genetics
  • Chromosomes, Plant / genetics
  • Computational Biology / methods*
  • Contig Mapping / methods*
  • DNA Transposable Elements / genetics
  • Genes, Plant*
  • Interspersed Repetitive Sequences / genetics*
  • Molecular Sequence Data
  • Multigene Family / genetics*
  • Oryza / genetics
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Sorghum / genetics
  • Terminal Repeat Sequences / genetics
  • Zea mays / genetics*

Substances

  • DNA Transposable Elements

Associated data

  • GENBANK/EF517600
  • GENBANK/EF517601