Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar;14(2):144-61.
doi: 10.1093/bib/bbs038. Epub 2012 Aug 20.

The UCSC Genome Browser and Associated Tools

Affiliations
Free PMC article

The UCSC Genome Browser and Associated Tools

Robert M Kuhn et al. Brief Bioinform. .
Free PMC article

Abstract

The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.

Figures

Figure 1
Figure 1
Screenshot of UCSC Genome Browser displaying human PICK1 gene region on chr22 in hg19 assembly. Different gene prediction algorithms predict different annotations in the region. By presenting multiple data sets of similar type, the user is able to more easily evaluate hypotheses. The different tracks often predict different 3′- and 5′-untranslated regions (half-height boxes on ends of annotations), coding regions (fullheight boxes), introns (thin line with transcription-direction arrows) or start and end coordinates. The differences may be used to establish a level of confidence in an annotation not obtained from any single method.
Figure 2
Figure 2
Cumulative number of genome assemblies released in UCSC Genome Browser, showing release dates for key genome assemblies. The steady increase reflects fewer updates of mature assemblies, such as human and mouse, balanced by the increase in new species being completed. Total genome assemblies are marked with all releases of human (hg*), mouse (mm*) and rat (rn*) assemblies. Invertebrate assemblies are marked with first and latest release dates for key model organisms, yeast (sacCer*), Drosophila melanogaster (dm*) and Caenorhabditis elegans (ce*).
Figure 3
Figure 3
Summary database schema for UCSC Genome Browser. The hgcentral database contains tables with metadata about entire genome assemblies. For example, the dbDb table has one row for each assembly, specifying assembly name, source, display options and other parameters. Each genome assembly has its own database (lower level), with assembly-specific metadata tables (trackDb, hgFindSpec) and one or more data tables per displayed Browser track.
Figure 4
Figure 4
Genome Browser display of CAT gene region on chr2 in mouse mm9 assembly. Shows gene structure (UCSC Genes track), PhyloP, PhastCons scores and multiz alignments. Note the high conservation in exon regions far back into evolutionary time (opossum, chicken, stickleback), while conservation in intron regions has been lost.
Figure 5
Figure 5
(A) Screenshot of Genome Browser ‘details page’ for mRNA EF101869 from human assembly hg19. Summary information for the mRNA is shown with links to GenBank record and other resources. At bottom is the link to details of mRNA alignment to genome assembly (Figure 5B). (B) Side-by-side alignment of first three exons in the ‘together’ format. For each alignment block, the mRNA is above (numbered in RNA-centric coordinates) and the genomic DNA below (using genomic coordinates). Note that because the mRNA sequence matches the reverse strand, the coordinates run in opposite directions.
Figure 6
Figure 6
Top of Genome Browser UCSC Genes ‘details page’ for human TP53 gene (hg19 assembly). This page is the gateway to deep information about the gene. The top section reproduces the RefSeq summary from NCBI, including information at the biochemical, genetic, cellular, physiological and clinical/medical levels, where available. This is followed by an index of this very large page. Links in the boxes lead to individual sections of the page giving details, with links to original data contributors for protein information, microarray gene expression data, pathways and other information. The second section, Sequence and Links to Tools and Databases, features links to content at UCSC (light, greenish background) and external sites (darker, bluish background), including Human Genome Nomenclature Consortium (HGNC), OMIM and others. The third section, Comments and Description Text from UniProtKB, is an example of content of the page from an outside source (each is linked from Page Source section above).
Figure 7
Figure 7
(A) Close-up view (58 bp window) of part of the fourth exon with several tracks displayed. At this resolution, the UCSC Genes track (top track) shows amino acid number and identity for each isoform. The second track, Human mRNAs, which has the ‘show nonsynonymous mRNA codons’ option turned on, shows that most mRNAs differ (arginine) from the reference assembly (proline) at amino acid 72 of the major isoform. Does the reference assembly have a minor allele at this location? At bottom are three tracks with information about SNPs in codon 72 of the major isoforms (two isoforms have different numbering). The James Watson track uses the Personal Genomics SNPs track display format, indicating that 6 C and 2 G nucleotides were found at this location. The coloring shows the relative amounts of the two nucleotides and the mouseover shows the actual read depth. This format may also be used to construct Custom Tracks. The OMIM Allelic Variants SNPs track shows that this SNP has a documented phenotypic association. A link from the details page directly to the relevant record at omim.org leads quickly to the information that this SNP is a polymorphism, not a mutation. The snp135 Polymorphism track at the bottom leads in one click to the information from dbSNP indicating that this SNP is present in the population (of more than 7500 individuals sampled) at nearly 50% frequency. (B) Midrange view (930 bp) of (left to right) the fourth, third and part of the second exon with several tracks displayed. The region shown in Figure 7A is in the center (fourth exon). The OMIM AV track has been dragged to top of image. At this resolution, the nucleotides and amino acids in UCSC Genes and mRNA tracks are not labeled: instead, codons appear as dark and light stripes. The mouseover pictured for one isoform of the gene (left side of image in UCSC Gene track) indicates that a click on the double-headed arrow will shift the display to the next, fifth, exon of this isoform (different isoforms have different numbering). The Conservation tracks show that the 3′-end of Exon 4 (conservation graph in Placental Mammal and Vertebrate Conservation tracks slightly left of center in the image) is more highly conserved than the 5′-end, and that conservation, in general, is higher in the exons than in the introns. The SNP track now shows several more polymorphisms, including two non-synonymous amino acid changes (red or intermediate gray in grayscale) and one synonymous amino acid change (green or light gray in grayscale). (C) Wide view showing the region (930 kb) around the TP53 gene (highlighted in 80.8 kb region in center). At this scale, it is advantageous to turn off the isoforms, on the configuration page as shown here, by clicking on the minibutton to the left of the track (arrow). The chromosome ideogram above the main Browser graphic shows the location on the chromosome as a red box superimposed on the chromosome bands on the short arm of chr17. The highlighted 80 kb region in the center may be viewed using drag-and-zoom by releasing the mouse button.
Figure 7
Figure 7
(A) Close-up view (58 bp window) of part of the fourth exon with several tracks displayed. At this resolution, the UCSC Genes track (top track) shows amino acid number and identity for each isoform. The second track, Human mRNAs, which has the ‘show nonsynonymous mRNA codons’ option turned on, shows that most mRNAs differ (arginine) from the reference assembly (proline) at amino acid 72 of the major isoform. Does the reference assembly have a minor allele at this location? At bottom are three tracks with information about SNPs in codon 72 of the major isoforms (two isoforms have different numbering). The James Watson track uses the Personal Genomics SNPs track display format, indicating that 6 C and 2 G nucleotides were found at this location. The coloring shows the relative amounts of the two nucleotides and the mouseover shows the actual read depth. This format may also be used to construct Custom Tracks. The OMIM Allelic Variants SNPs track shows that this SNP has a documented phenotypic association. A link from the details page directly to the relevant record at omim.org leads quickly to the information that this SNP is a polymorphism, not a mutation. The snp135 Polymorphism track at the bottom leads in one click to the information from dbSNP indicating that this SNP is present in the population (of more than 7500 individuals sampled) at nearly 50% frequency. (B) Midrange view (930 bp) of (left to right) the fourth, third and part of the second exon with several tracks displayed. The region shown in Figure 7A is in the center (fourth exon). The OMIM AV track has been dragged to top of image. At this resolution, the nucleotides and amino acids in UCSC Genes and mRNA tracks are not labeled: instead, codons appear as dark and light stripes. The mouseover pictured for one isoform of the gene (left side of image in UCSC Gene track) indicates that a click on the double-headed arrow will shift the display to the next, fifth, exon of this isoform (different isoforms have different numbering). The Conservation tracks show that the 3′-end of Exon 4 (conservation graph in Placental Mammal and Vertebrate Conservation tracks slightly left of center in the image) is more highly conserved than the 5′-end, and that conservation, in general, is higher in the exons than in the introns. The SNP track now shows several more polymorphisms, including two non-synonymous amino acid changes (red or intermediate gray in grayscale) and one synonymous amino acid change (green or light gray in grayscale). (C) Wide view showing the region (930 kb) around the TP53 gene (highlighted in 80.8 kb region in center). At this scale, it is advantageous to turn off the isoforms, on the configuration page as shown here, by clicking on the minibutton to the left of the track (arrow). The chromosome ideogram above the main Browser graphic shows the location on the chromosome as a red box superimposed on the chromosome bands on the short arm of chr17. The highlighted 80 kb region in the center may be viewed using drag-and-zoom by releasing the mouse button.
Figure 8
Figure 8
Overview map of UCSC Genome Browser and associated tools. The user interacts with the system via the CGIs (upper shaded area), by clicking in a web browser, such as Firefox, Chrome, Safari or Internet Explorer. The user may upload Custom Tracks. The user receives screen images, saved sessions, data files and pdf image files. The user may also make data available to the Browser via a remote data hub and view the data from other hubs. The CGIs interact with each other as indicated: position information may be sent from the Genome Browser to the Table Browser, and Custom Tracks or tabular data returned. Details about an alignment are available by clicking on an item in the Browser viewer, which in turn may have links to external websites (dotted arrow, upper right). BLAT and isPCR alignments are displayed as tracks in the Browser image. All of the CGIs obtain information from the filesystem or database (lower shaded area), much of which originates with third-party data contributors (upper left). All data are also available to the user directly via ftp download and via direct access to the genome-mysql server (lower left).

Similar articles

  • The UCSC Genome Browser Database: update 2009.
    Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, Meyer L, Hsu F, Hinrichs AS, Harte RA, Giardine B, Fujita P, Diekhans M, Dreszer T, Clawson H, Barber GP, Haussler D, Kent WJ. Kuhn RM, et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D755-61. doi: 10.1093/nar/gkn875. Epub 2008 Nov 7. Nucleic Acids Res. 2009. PMID: 18996895 Free PMC article.
  • The UCSC Genome Browser Database: update 2006.
    Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ. Hinrichs AS, et al. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D590-8. doi: 10.1093/nar/gkj144. Nucleic Acids Res. 2006. PMID: 16381938 Free PMC article.
  • Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser.
    Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, Kent WJ. Raney BJ, et al. Bioinformatics. 2014 Apr 1;30(7):1003-5. doi: 10.1093/bioinformatics/btt637. Epub 2013 Nov 13. Bioinformatics. 2014. PMID: 24227676 Free PMC article.
  • A brief introduction to web-based genome browsers.
    Wang J, Kong L, Gao G, Luo J. Wang J, et al. Brief Bioinform. 2013 Mar;14(2):131-43. doi: 10.1093/bib/bbs029. Epub 2012 Jul 3. Brief Bioinform. 2013. PMID: 22764121 Review.
  • UCSC genome browser tutorial.
    Zweig AS, Karolchik D, Kuhn RM, Haussler D, Kent WJ. Zweig AS, et al. Genomics. 2008 Aug;92(2):75-84. doi: 10.1016/j.ygeno.2008.02.003. Epub 2008 Jun 2. Genomics. 2008. PMID: 18514479 Review.
See all similar articles

Cited by 193 articles

See all "Cited by" articles

References

    1. Kent WJ, Sugnet CW, Furey TS, et al. The Human Genome Browser at UCSC. Genome Res. 2002;12:996–1006. - PMC - PubMed
    1. Fujita PA, Rhead B, Zweig AS, et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011;39:D876–82. - PMC - PubMed
    1. Dreszer TR, Karolchik D, Zweig AS, et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 2012;40:D918–23. - PMC - PubMed
    1. Kuhn RM, Karolchik D, Zweig AS, et al. The UCSC Genome Browser Database: update 2009. Nucleic Acids Res. 2009;37:D755–61. - PMC - PubMed
    1. Rhead B, Karolchik D, Kuhn RM, et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010;38:D613–9. - PMC - PubMed

Publication types

Feedback