The I locus in soybean (Glycine max) corresponds to a region of chalcone synthase (CHS) gene duplications affecting seed pigmentation. We sequenced and annotated BAC clone 104J7, which harbors a dominant i(i) allele from Glycine max 'Williams 82', to gain insight into the genetic structure of this multigenic region in addition to examining its flanking regions. The 103-kb BAC encompasses a gene-rich region with 11 putatively expressed genes. In addition to six copies of CHS, these genes include: a geranylgeranyltransferase type II beta subunit (E.C.22.214.171.124), a beta-galactosidase, a putative spermine and (or) spermidine synthase (E.C.126.96.36.199), and an unknown expressed gene. Strikingly, sequencing data revealed that the 10.91-kb CHS1, CHS3, CHS4 cluster is present as a perfect inverted repeat separated by 5.87 kb. Contiguous arrangement of CHS paralogs could lead to folding into multiple secondary structures, hypothesized to induce deletions that have previously been shown to effect CHS expression. BAC104J7 also contains several gene fragments representing a cation/hydrogen exchanger, a 40S ribosomal protein, a CBL-interacting protein kinase, and the amino terminus of a subtilisin. Chimeric ESTs were identified that may represent read-through transcription from a flanking truncated gene into a CHS cluster, generating aberrant CHS RNA molecules that could play a role in CHS gene silencing.