Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May;21(5):775-89.
doi: 10.1101/gr.110254.110. Epub 2011 Mar 3.

Genome-wide Characterization of Transcriptional Start Sites in Humans by Integrative Transcriptome Analysis

Affiliations
Free PMC article

Genome-wide Characterization of Transcriptional Start Sites in Humans by Integrative Transcriptome Analysis

Riu Yamashita et al. Genome Res. .
Free PMC article

Abstract

We performed a genome-wide analysis of transcriptional start sites (TSSs) in human genes by multifaceted use of a massively parallel sequencer. By analyzing 800 million sequences that were obtained from various types of transcriptome analyses, we characterized 140 million TSS tags in 12 human cell types. Despite the large number of TSS clusters (TSCs), the number of TSCs was observed to decrease sharply with increasing expression levels. Highly expressed TSCs exhibited several characteristic features: Nucleosome-seq analysis revealed highly ordered nucleosome structures, ChIP-seq analysis detected clear RNA polymerase II binding signals in their surrounding regions, evaluations of previously sequenced and newly shotgun-sequenced complete cDNA sequences showed that they encode preferable transcripts for protein translation, and RNA-seq analysis of polysome-incorporated RNAs yielded direct evidence that those transcripts are actually translated into proteins. We also demonstrate that integrative interpretation of transcriptome data is essential for the selection of putative alternative promoter TSCs, two of which also have protein consequences. Furthermore, discriminative chromatin features that separate TSCs at different expression levels were found for both genic TSCs and intergenic TSCs. The collected integrative information should provide a useful basis for future biological characterization of TSCs.

Figures

Figure 1.
Figure 1.
Expression pattern distributions of the TSCs with the indicated expression levels (x-axis) in the cell lines (A) and tissues (C). The cell and tissue origins of the TSCs are shown in the inset. The cumulative populations of the TSCs with expression levels in excess of the values shown on the x-axis are shown in B (cell lines) and D (tissues). (E) Distribution of the TSCs with maximum expression levels in each of the 12 cell types (red line) and the cumulative population of the TSCs (blue line). (F) Cell type distribution of the TSCs. The number of cell types (x-axis) in which the TSCs with the indicated maximum expression levels (inset) were observed is shown.
Figure 2.
Figure 2.
Expression patterns of the TSCs overlapping the pol II binding sites. (A) Frequencies of the TSCs that overlapped the pol II binding sites in cell lines at the indicated expression levels (x-axis). Cell origins are as indicated in the inset. (B) Frequencies and cumulative populations of the TSCs that overlap pol II binding sites in the respective cell lines are shown.
Figure 3.
Figure 3.
Nucleosome structure around the TSCs with different expression patterns. (A) The nucleosome occupancy scores (y-axis) around the TSCs (x-axis) of different expression levels in DLD-1 cells. Expression levels of the TSCs are as indicated in the inset. The results of a similar analysis of different cell types are shown in Supplemental Figure S3. (B) Nucleosome structures in the regions that surround TSCs with expression levels <5 ppm. The scores for TSCs that did and did not overlap the pol II binding sites in DLD-1 cells are indicated by red and blue lines, respectively. (C) Nucleosome structures in the regions that surround the TSCs that were expressed in two or fewer cell types (blue and green lines) or in at least eight cell types (red and yellow lines). TSCs that did and did not overlap the pol II binding sites in any of the four cell lines (DLD-1, HEK293, MCF-7, or TIG-3) are indicated by blue and green lines, respectively.
Figure 4.
Figure 4.
Translational consequences of the TSCs. (A) Subcellular fractionation of the nuclear, cytoplasmic, and polysomal components of DLD-1 cells. (Left) RT–PCR results of the indicated nuclear RNAs. (N) Nuclear fraction, (C) cytoplasmic fraction. (Right) Sucrose density gradient (SDG) purification of polysomes. Separation of the cytoplasmic fraction from the nuclear fraction was confirmed by real time RT–PCR using nuclear scaRNAs and snoRNAs (also see Supplemental Fig. S7A) and by Western blot analysis using nuclear lamin A/C proteins and cytoplasmic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) protein (bottom left). The cytoplasmic fraction was further separated to isolate the polysomal fraction by SDG centrifugation. The fraction from which the RNAs were extracted is indicated by the arrow (right). (B) Number of TSCs supported by three or more RNA-seq tags in the polysomal fraction of DLD-1 cells. The statistical significances of differences in the distribution of the numbers of the supporting RNA-seq tags are also shown for the indicated populations. TSCs that did and did not overlap pol II binding sites in DLD-1 cells are indicated by red and blue boxes, respectively. (C) Number of TSCs that exhibited statistical enrichment (P < 0.01) of the RNA-seq tags in the polysomal fraction in comparison to the nuclear and cytoplasmic fractions. The statistical significances of differences in the distribution of the P-values are also shown for the indicated populations. Details of the RNA tag counts in each population of TSCs are shown in Supplemental Figure S6. The computational procedures used for these analyses are presented in the Methods.
Figure 5.
Figure 5.
Characterization of the APs in DLD-1 cells via transcriptome data integration. (A) An example of APs for which both the TSS-seq and the ChIP-seq of pol II analyses supported simultaneous expression in a single gene in DLD-1 cells. (B) Nucleosome structures in the regions that surround the TSCs for second or later APs (indicated as “AP2”), which were expressed at <5 ppm (blue line), overlapped pol II binding sites (red line), or did not overlap pol II binding sites (yellow line). The nucleosome structures at the randomly selected intronic regions according to RefSeq information are also shown (green line). (C,D) Integration of transcriptome data and Western blotting for the HOXB6 (NM_018952; C) and CDX2 (NM_001265; D) genes. Bands of the expected molecular weights are indicated by arrows. Blue and yellow boxes represent predicted untranslated regions and CDSs, respectively. The peptides that were used to raise the antibodies are shown in the margin. (*1) The presence of multiple proteins was also suggested by UniProt (P17509 and P17509-2). (*2) The amino acid sequence had to be deduced from the cDNA sequence that overlapped with AP1, although this sequence lacked the canonical ATG initiator codon.
Figure 6.
Figure 6.
Differential usage of the APs. Examples of the APs in genes that belong to the GO categories of “ribosome” (A), “serine/threonine kinase” (B), “cell adhesion” (C), and “transcription factor” (D). Each number above the horizontal line shows the genomic coordinate. The red and blue arrows represent AP1 and AP2, respectively. For the RefSeq genes, coding and noncoding regions are represented by yellow and blue boxes, respectively.
Figure 7.
Figure 7.
Characterization of intergenic TSCs. (A) The numbers of iTSCs at the indicated expression level (x-axis) are shown. (B) Numbers and cumulative populations of the iTSCs with maximum expression levels, as indicated on the x-axis, in 12 cell types. (C) Frequencies of the iTSCs that overlapped the pol II binding sites. Origins of the cell lines are as indicated in the inset. (D) The frequencies of the iTSCs for which the RNA-seq tags in the polysomal fractions of DLD-1 cells were enriched (P < 0.01) in comparison to the nuclear and cytoplasmic fractions.
Figure 8.
Figure 8.
Histone modifications in regions surrounding the TSCs. (A) Average tag concentrations (y-axis) obtained in ChIP-seq analyses of H3K4me3 (left) and H3Ac (right) in the surrounding regions of genic TSCs (top) and iTSCs (bottom) for DLD-1 cells. Expression levels of the TSCs are indicated in the insets. (B) Number of TSCs overlapping the indicated signals. For the extensive analysis using the nucleosome-seq data, see Supplemental Figure S9.

Similar articles

See all similar articles

Cited by 58 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback