Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 25;37(20):3650-3651.
doi: 10.1093/bioinformatics/btab342.

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets

Affiliations

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets

Ales Varabyou et al. Bioinformatics. .

Abstract

Summary: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input.

Availability and implementation: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the workflow of TieBrush and illustration of the summarized data produced by TieCov. (a) Redundant reads from two samples (left and right diagonals) are identified by TieBrush to produce a collapsed representation of 10 reads in 5 records. (b) Comparison of transcription of the NEFL gene in heart (top) and brain (bottom) tissues from GTEx. Each tissue is represented by three tracks produced by TieCov: read coverage (top), percent of samples containing the reads (middle) and splice junctions (bottom). The plot illustrates the higher prevalence and expression of the gene in brain tissue. (c) Comparison of transcription of the SLC25A3 gene in heart (top) and brain (bottom) tissues from GTEx. While the gene is expressed in both tissues, coverage data clearly indicate an exon switch where the third and fourth exons are expressed at dramatically different levels in the two tissues

Similar articles

Cited by

References

    1. Li H. et al.; 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079. - PMC - PubMed
    1. Lonsdale J. et al. (2013) The genotype-tissue expression (GTEx) project. Nat. Genet., 45, 580–585. - PMC - PubMed
    1. Pedersen B.S. et al. (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics, 34, 867–868. - PMC - PubMed
    1. Quinlan A.R. et al. (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–842. - PMC - PubMed
    1. Rosenbloom K.R. et al. (2015) The UCSC genome browser database: 2015 update. Nucleic Acids Res., 43, D670–D681. - PMC - PubMed