Protein-coding genes in the ancient eukaryote Giardia lamblia lack typical promoter consensus elements. We have analysed the immediate 5' flanking sequences of seven genes of related function (structural cytoskeleton proteins) to identify shared DNA motifs that might have a role in transcription initiation. Transcription start sites for five genes have been determined previously. Genomic mapping and mRNA primer extension experiments demonstrate additionally that the genes for beta-giardin and median body protein are (i) present as single copies in the genome, (ii) transcribed with very short 5' leader sequences. Two search algorithms designed to extract conserved motifs from either aligned or non-aligned sequences independently discovered three sites constituting a common pattern in all seven promoters. Sites were optimally aligned using weight matrix building trials to achieve the maximum 'information content'. Profiling the information content of best alignments defines the extent of the homologies as: a 9 bp box (initiator) at the start site and upstream 18 and 6 bp boxes. The initiator is the most highly conserved element and contains a universal Py-A-Pu motif at which transcription starts. We show that the best matrices can be combined in a search pattern that correctly locates transcription start sites in genomic DNA sequences.