Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 31;13(8):R77.
doi: 10.1186/gb-2012-13-8-r77.

ggbio: an R package for extending the grammar of graphics for genomic data

ggbio: an R package for extending the grammar of graphics for genomic data

Tengfei Yin et al. Genome Biol. .

Abstract

We introduce ggbio, a new methodology to visualize and explore genomics annotations and high-throughput data. The plots provide detailed views of genomic regions, summary views of sequence alignments and splicing patterns, and genome-wide overviews with karyogram, circular and grand linear layouts. The methods leverage the statistical functionality available in R, the grammar of graphics and the data handling capabilities of the Bioconductor project. The plots are specified within a modular framework that enables users to construct plots in a systematic way, and are generated directly from Bioconductor data structures. The ggbio R package is available at http://www.bioconductor.org/packages/2.11/bioc/html/ggbio.html.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene structure. Example of exons of gene SSX4 and SSX4B isoforms, annotated to illustrate the grammar of graphics extensions used. The filled rectangle represents exons and the chevron represents introns. They are grouped by transcript ID and the y axis shows the stepping levels, which stacks transcripts to avoid overplotting. Color is mapped from strand direction.
Figure 2
Figure 2
Splicing summary with coordinate truncate gaps. An example of a plot made from multiple tracks. At the top, the relevant chromosome is drawn with the subregion of interest marked in red. The middle track shows the slicing summary plots for the gene ALDOA for normal and tumor samples. Splicing is shown as arches and size is used to represent junction counts and color represents novelty: blue indicates known splicing events against the model and red indicates novel splicing events. The height of arches is proportional to the distance between the two ends of the arches, or the distance between the junction reads. Coverage is shown by position to address supporting evidence in the raw data. The splicing summary plots are aligned with a view of the gene structure in the bottom track. The thicker rectangle represents the Consensus Coding DNA Sequence (CCDS) transcripts. The plots in both tracks are made with truncate gaps coordinate transformation. The space dedicated to introns has been significantly reduced, and the exonic regions are shown in detail, even though the entire gene region is in view.
Figure 3
Figure 3
Manhattan plot. Grand linear view applied to a Manhattan plot as part of a genome-wide association study in Angus cattle. The y axis shows genetic variance, calculated by sliding windows of five consecutive SNPs for the infectious bovine keratoconjunctivitis (IBK; a type of pinkeye) score. The x axis is the genomic coordinates with all the chromosomes side-by-side. The horizontal striping of color helps to indicate the end of one chromosome and beginning of another. The plot is faceted by three different analysis methods. There is one extreme variance in the middle facet, in the region of chromosome 23. There are also a few large values in other regions. According to the results from the paper, three of these regions, 2, 13, and 23 are found to be potentially indicative of a quantitative trait locus associated with IBK.
Figure 4
Figure 4
Stacked karyogram overview. Karyogram plot shows a subset of human RNA-editing sites, and they are color coded for different regions as follows: red indicates exons, green indicates introns and blue indicates exons/introns status is unknown.
Figure 5
Figure 5
Single sample circular view. DNA structural rearrangements and somatic mutation in a single colorectal tumor sample (CRC-1). The outer ring shows the ideogram of the human autosomes, labeled with chromosome numbers and scales. The segments represent the missense somatic mutations. The point tracks show score and support for rearrangement. The size of the points indicates the number of supporting read pairs in the tumor and the y value indicates the score for each rearrangement. The links represent the rearrangements, where intrachromosomal events are colored green and interchromosomal events are colored orange.
Figure 6
Figure 6
Mismatch summary. An example of a mismatch summary plot, with associated variant calls. The top track shows a barchart of reference counts in gray and mismatched counts colored by the nucleotide. The middle track shows SNPs as letters, color coded also by nucleotide. There is one mismatch, 'T', that is different for all of the reads from the 'A' in the reference genome (bottom letter plot).
Figure 7
Figure 7
Edge-linked interval to data view. Edge-linked interval to data view for the expression of the exons of gene PDIA6. The top track shows the expression level for each of the exons, and the color indicates the sample (GM12878 or K562). The second track shows the links between the even-spaced expression track and the exons track, below. The package DEXseq, which produces a similar graphic, computes differential expression and significance, and significance is indicated by coloring the connecting lines red. The track at the bottom shows the annotated transcripts.
Figure 8
Figure 8
MA-plot. MA-plot for differential expression analysis in four RNA-seq samples with two cell lines GM12878 and K562, annotated to illustrate the use of the grammar of graphics. Points is our geometric object, x axis indicates the normalized mean and the y axis indicates the log2 fold change. Aesthetics mapping took place between the groups and the color to use red to indicate the most significant differently expressed observation (gene). This plot uses Cartesian coordinates.
Figure 9
Figure 9
Diagram of the ggbio framework for processing sequence data. It starts with a mapping from different file types to different objects or data structure in R, using Bioconductor tools, followed by general and extended grammar of graphics mapping of data elements to graphical components. The final stage arranges the graphics in a designed layout to show annotation tracks or multiple data sets. Orange boxes and dark brown arrows indicate the extensions provided by ggbio.
Figure 10
Figure 10
Coverage transformation. Statistical transformation, coverage and stepping, are used to summarize short reads data. Top: a set of (simulated) short reads, displayed using the stepping transformation, vertically, and the default geom 'rectangle'. Bottom: coverage is shown on the vertical axis, using the geom 'area'. This example applies the data model GRanges object.

Similar articles

Cited by

References

    1. Integrated Genome Browser. http://bioviz.org/igb/
    1. Nicol J, Helt G, Blanchard S, Raja A, Loraine A. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics. 2009;13:2730. doi: 10.1093/bioinformatics/btp472. - DOI - PMC - PubMed
    1. Integrative Genomics Viewer. http://www.broadinstitute.org/igv/
    1. Robinson J, Thorvaldsdottir H, Winckler W, Guttman M, Lander E, Getz G, Mesirov J. Integrative genomics viewer. Nat Biotechnol. 2011;13:24–26. doi: 10.1038/nbt.1754. - DOI - PMC - PubMed
    1. Flicek P, Amode M, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Singh Riat H, Rios D, Ritchie G, Ruer M, Schuster M. et al.Ensembl 2011. Nucleic Acids Res. 2011;13:D800. doi: 10.1093/nar/gkq1064. - DOI - PMC - PubMed

Publication types