IGD: high-performance search for large-scale genomic interval datasets

Bioinformatics. 2021 Apr 9;37(1):118-120. doi: 10.1093/bioinformatics/btaa1062.

Abstract

Summary: Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions.

Availabilityand implementation: https://github.com/databio/IGD.

Supplementary information: Supplementary data are available at Bioinformatics online.