Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12 (4), 357-60

HISAT: A Fast Spliced Aligner With Low Memory Requirements

Affiliations

HISAT: A Fast Spliced Aligner With Low Memory Requirements

Daehwan Kim et al. Nat Methods.

Abstract

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

Figures

Figure 1
Figure 1
RNA-seq read types and their relative proportions from 20 million simulated 100-bp reads. (a) Five types of RNA-seq reads: (i) M, exonic read; (ii) 2M_gt_15, junction reads with long, >15-bp anchors in both exons; (iii) 2M_8_15, junction reads with intermediate, 8- to 15-bp anchors; (iv) 2M_1_7, junction reads with short, 1- to 7-bp, anchors; and (v) gt_2M, junction reads spanning more than two exons. (b) Relative proportions of different types of reads in the 20 million 100-bp simulated read data.
Figure 2
Figure 2
Alignment speed of spliced alignment software for 20 million simulated 100-bp reads. Alignment speed for all read types (defined in Fig. 1) combined, measured as the number of reads processed per second by the indicated tools. supplementary Figure 2 provides the alignment speed for each type of read separately.
Figure 3
Figure 3
Alignment accuracy of spliced alignment software for 20 million simulated 100-bp reads. Alignment results for all read types (defined in Fig. 1) on simulated data containing errors. Reads are categorized as indicated by the colors. For multimapped reads, an aligner was credited with a correct alignment if it mapped a read to multiple locations and one of those locations was correct. Note that the set of multimapped reads reported by the various aligners may be different, depending on each program’s alignment policy and default behavior. The upper numbers are the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses show percentages for cases correctly and uniquely mapped and correctly multimapped combined. In Supplementary Table 2, we provide detailed percentages on all four categories for each aligner.
Figure 4
Figure 4
Alignment accuracy of spliced-alignment software for reads with small anchors from 20 million simulated reads. This figure shows the alignment sensitivity for reads with small anchors (2M_8_15 and 2M_1_7). Reads are categorized as in Figure 3. The upper numbers on each bar show the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses represent the percentages for cases correctly and uniquely mapped and correctly multimapped combined. There were 1,022,348 and 843,420 reads in 2M_8_15 and 2M_1_7, respectively.

Similar articles

See all similar articles

Cited by 1,858 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback