Most mammalian genes will soon be characterized as cDNA sequences with little information about their function. To utilize this sequence information for large-scale functional studies, a gene trap retrovirus shuttle vector has been developed to disrupt genes expressed in murine embryonic stem (ES) cells. A library of mutant clones was isolated, and regions of genomic DNA adjacent to 400 independent provirus inserts were cloned and sequenced. The flanking sequences, designated 'promoter-proximal sequence tags', or PSTs, identified 63 specific genes and anonymous cDNAs disrupted as a result of virus integration. The efficiency of tagged sequence mutagenesis suggests that many of the 10,000-20,000 genes expressed in ES cells can be targeted, providing defined mutations for the analysis of gene functions in vivo. In addition, PSTs provide the first expressed sequence tags derived from genomic DNA, and define gene features such as exon boundaries and promoters that are missing from cDNA sequences.