Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 97 (7), 3491-6

Shotgun Sequencing of the Human Transcriptome With ORF Expressed Sequence Tags


Shotgun Sequencing of the Human Transcriptome With ORF Expressed Sequence Tags

E Dias Neto et al. Proc Natl Acad Sci U S A.


Theoretical considerations predict that amplification of expressed gene transcripts by reverse transcription-PCR using arbitrarily chosen primers will result in the preferential amplification of the central portion of the transcript. Systematic, high-throughput sequencing of such products would result in an expressed sequence tag (EST) database consisting of central, generally coding regions of expressed genes. Such a database would add significant value to existing public EST databases, which consist mostly of sequences derived from the extremities of cDNAs, and facilitate the construction of contigs of transcript sequences. We tested our predictions, creating a database of 10,000 sequences from human breast tumors. The data confirmed the central distribution of the sequences, the significant normalization of the sequence population, the frequent extension of contigs composed of existing human ESTs, and the identification of a series of potentially important homologues of known genes. This approach should make a significant contribution to the early identification of important human genes, the deciphering of the draft human genome sequence currently being compiled, and the shotgun sequencing of the human transcriptome.


Figure 1
Figure 1
The predicted, simulated, and experimentally determined position of ORESTES. The smooth, solid curve shows the predicted percentage of ORESTES that should contain the point, with the relative position shown within a hypothetical transcript. The curves described by the symbols indicated the coverage of known, full-length genes by ORESTES of different length generated by computational simulation. The irregular, solid line in bold shows the actual percentage of ORESTES that passed through the relative position of full-length cDNA sequences.
Figure 2
Figure 2
A comparison of the actual percentage of ORESTES and 5′ and 3′ ESTs that pass through the relative position of full-length cDNA sequences. The figure was constructed by using all human full-length cDNAs of more than 1 kb currently in GenBank, the ORESTES corresponding to these cDNAs, as well as the 3′ and 5′ ESTs available in GenBank corresponding to these genes. With cDNAs of less than 1 kb, the 3′, 5′, and ORESTES reads are highly superimposed, making their relative contributions difficult to distinguish. Small cDNAs thus were not included in the figure.
Figure 3
Figure 3
Comparison of abundance of ORESTES and ESTs. (A) Nonnormalized breast tumor cDNA library NCI CGAP Br1.1. (B) Normalized breast tumor cDNA library NCI CGAP Br2. (C) ORESTES. The bars show the percentage of nonredundant sequences with similarity to full-length human cDNAs that matched UniGene clusters containing the number of ESTs shown.

Similar articles

See all similar articles

Cited by 56 PubMed Central articles

See all "Cited by" articles

Publication types


Associated data

LinkOut - more resources