Background: A central issue in genome analysis is the identification and characterization of coding regions. Estimating the coding complexity of vertebrate genomes by measuring the kinetic complexity of mRNA populations and by sequence analysis of cDNAs is limited by the fact that any given source of mRNA represents a very biased sample of all genes. Exon trapping is a method that enables the identification of genes irrespective of their transcriptional status.
Results: Exons were trapped from the entire mouse genome, and the resulting fragments cloned. About 7% of a random sample of exons taken from this library have significant structural homology or sequence similarity to previously sequenced genes. Using cDNAs derived from several stages of mouse development, evidence for expression of about 62% of this sample of exons was found. These data suggest that the great majority of 'exons' in the library are derived from genes. We estimate that the fraction of the genome contained in trapped exons is 2.4%; this corresponds to a sequence complexity of about 72 megabases.
Conclusions: The library of exons trapped from the entire mouse genome probably represents one of the least biased and most comprehensive libraries of mouse coding regions, and should therefore prove very useful for finding genes during genome mapping and sequencing.