Heterogeneous nuclear ribonucleoproteins (hnRNPs) are thought to influence the structure of hnRNA and participate in the processing of hnRNA to mRNA. The hnRNP U protein is an abundant nucleoplasmic phosphoprotein that is the largest of the major hnRNP proteins (120 kDa by SDS-PAGE). HnRNP U binds pre-mRNA in vivo and binds both RNA and ssDNA in vitro. Here we describe the cloning and sequencing of a cDNA encoding the hnRNP U protein, the determination of its amino acid sequence and the delineation of a region in this protein that confers RNA binding. The predicted amino acid sequence of hnRNP U contains 806 amino acids (88,939 Daltons), and shows no extensive homology to any known proteins. The N-terminus is rich in acidic residues and the C-terminus is glycine-rich. In addition, a glutamine-rich stretch, a putative NTP binding site and a putative nuclear localization signal are present. It could not be defined from the sequence what segment of the protein confers its RNA binding activity. We identified an RNA binding activity within the C-terminal glycine-rich 112 amino acids. This region, designated U protein glycine-rich RNA binding region (U-gly), can by itself bind RNA. Furthermore, fusion of U-gly to a heterologous bacterial protein (maltose binding protein) converts this fusion protein into an RNA binding protein. A 26 amino acid peptide within U-gly is necessary for the RNA binding activity of the U protein. Interestingly, this peptide contains a cluster of RGG repeats with characteristic spacing and this motif is found also in several other RNA binding proteins. We have termed this region the RGG box and propose that it is an RNA binding motif and a predictor of RNA binding activity.