A clone was isolated by screening of a cosmid library of Mycobacterium tuberculosis with an oligonucleotide designed from the N-terminal sequence of a previously reported proline-rich protein. Characterization of the 4481 bp insert showed the presence of polymorphic CG-repetitive sequences (PGRSs) with an ORF of 2.7 kb, encoding a 81.3 kDa protein (PE-PGRS81). Southern blot analysis and BLAST-p searches revealed several homologous sequences in the genome of M. tuberculosis. The deduced amino acid sequence was highly similar to a stretch of about 98 residues in the N-terminus present in several members of the PE-PGRS family available in the GenBank database, including 100% identity with the partial amino acid sequence of the potential protein encoded by orf3' as well as with the Rv0278c sequence. A neighbour-joining analysis of the 99 PE-PGRS sequences available in the database indicated that PE-PGRS81 is included in a group where its closest relatives are the sequences orf3', Rv0278c, Rv0279c, Rv1759c, Rv3652 and Rv0747. Probing with the complete coding regions of PE-PGRS81 and Rv1759c in Southern blot assays, on samples of genomic DNA from M. tuberculosis H37Rv, Mycobacterium bovis BCG and M. tuberculosis clinical isolates, showed a complex hybridization pattern for all strains. This shows the existence of intrastrain PGRS variability as reported for other PGRS members. In contrast, probing with the short conserved N-terminal region of Rv1759c reduced the hybridization to a single band. This marker allowed identification of M. tuberculosis clinical strains that lack Rv1759c. A recombinant C-terminal fragment of Rv1759c showed fibronectin-binding properties and was recognized by sera from patients infected with M. tuberculosis, suggesting that at least this member of the PE-PGRS is expressed in tuberculosis infection.