Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 16;21(1):314.
doi: 10.1186/s12859-020-03649-5.

PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

Affiliations

PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

Cinque Soto et al. BMC Bioinformatics. .

Abstract

Background: Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability.

Results: We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering.

Conclusions: PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

Keywords: Antibody; CDR3; IgBLAST; Illumina; Immune repertoires.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing interests.

Figures

Fig. 1
Fig. 1
Multiprocessing performance for PyIR and multithreaded IgBLAST (version 1.14). a One million synthetic immunoglobulin sequences were used to time PyIR (dark grey, ♦) against multithreaded IgBLAST (version 1.14) (grey, ■) as a function of the number of processes. Idealized timings are shown as a black dashed line. Average timings were measured over the three trial runs for 1 million sequences and computed separately for both IgBLAST and PyIR. Standard deviations appear as error bars for both methods. X and Y axes are in log2 space. b The speedup of PyIR relative to multithreaded IgBLAST (version 1.14) as a function of the number of simultaneous processes. Timings were done on a workstation equipped with 4 Opteron 6278 hyper-threaded 8-core processors for a total of 64 CPU threads using the average timings from (a). The X and Y axes are in log2 space. c One billion synthetic immunoglobulin sequences were used to determine the speedup PyIR achieved over multithreaded IgBLAST (version 1.14) as a function of the number of sequences. Idealized speedups are shown as a black dashed line. Timings were done on a workstation equipped 4 Xeon Platinum 8280 hyperthreaded 28-core processors for a total of 224 CPU threads. X and Y axes are in log10 space

Similar articles

Cited by

References

    1. Soto C, Bombardi RG, Branchizio A, Kose N, Matta P, Sevy AM, Sinkovits RS, Gilchuk P, Finn JA, Crowe JE., Jr High frequency of shared clonotypes in human B cell receptor repertoires. Nature. 2019;566(7744):398–402. doi: 10.1038/s41586-019-0934-8. - DOI - PMC - PubMed
    1. Briney B, Inderbitzin A, Joyce C, Burton DR. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature. 2019;566(7744):393–397. doi: 10.1038/s41586-019-0879-y. - DOI - PMC - PubMed
    1. Weinstein JA, Jiang N, White RA, 3rd, Fisher DS, Quake SR. High-throughput sequencing of the zebrafish antibody repertoire. Science. 2009;324(5928):807–810. doi: 10.1126/science.1170020. - DOI - PMC - PubMed
    1. Briney BS, Willis JR, Crowe JE., Jr Location and length distribution of somatic hypermutation-associated DNA insertions and deletions reveals regions of antibody structural plasticity. Genes Immun. 2012;13(7):523–529. doi: 10.1038/gene.2012.28. - DOI - PMC - PubMed
    1. Zhu J, Ofek G, Yang Y, Zhang B, Louder MK, Lu G, McKee K, Pancera M, Skinner J, Zhang Z, et al. Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. Proc Natl Acad Sci U S A. 2013;110(16):6470–6475. doi: 10.1073/pnas.1219320110. - DOI - PMC - PubMed