cyvcf2: fast, flexible variant analysis with Python
- PMID: 28165109
- PMCID: PMC5870853
- DOI: 10.1093/bioinformatics/btx057
cyvcf2: fast, flexible variant analysis with Python
Abstract
Motivation: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files.
Results: We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility.
Contact: bpederse@gmail.com or aaronquinlan@gmail.com.
Availability and implementation: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.
© The Author 2017. Published by Oxford University Press.
Similar articles
-
vcfpp: a C++ API for rapid processing of the variant call format.Bioinformatics. 2024 Feb 1;40(2):btae049. doi: 10.1093/bioinformatics/btae049. Bioinformatics. 2024. PMID: 38273677 Free PMC article.
-
MethylCoder: software pipeline for bisulfite-treated sequences.Bioinformatics. 2011 Sep 1;27(17):2435-6. doi: 10.1093/bioinformatics/btr394. Epub 2011 Jun 30. Bioinformatics. 2011. PMID: 21724594 Free PMC article.
-
VCF-kit: assorted utilities for the variant call format.Bioinformatics. 2017 May 15;33(10):1581-1582. doi: 10.1093/bioinformatics/btx011. Bioinformatics. 2017. PMID: 28093408 Free PMC article.
-
SeqArray-a storage-efficient high-performance data format for WGS variant calls.Bioinformatics. 2017 Aug 1;33(15):2251-2257. doi: 10.1093/bioinformatics/btx145. Bioinformatics. 2017. PMID: 28334390 Free PMC article.
-
crosshap: R package for local haplotype visualization for trait association analysis.Bioinformatics. 2023 Aug 1;39(8):btad518. doi: 10.1093/bioinformatics/btad518. Bioinformatics. 2023. PMID: 37607004 Free PMC article. Review.
Cited by
-
Variation in the Spectrum of New Mutations among Inbred Strains of Mice.Mol Biol Evol. 2024 Aug 2;41(8):msae163. doi: 10.1093/molbev/msae163. Mol Biol Evol. 2024. PMID: 39101589 Free PMC article.
-
Analysis-ready VCF at Biobank scale using Zarr.bioRxiv [Preprint]. 2024 Jun 12:2024.06.11.598241. doi: 10.1101/2024.06.11.598241. bioRxiv. 2024. PMID: 38915693 Free PMC article. Preprint.
-
Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data.bioRxiv [Preprint]. 2024 Aug 21:2024.04.23.590800. doi: 10.1101/2024.04.23.590800. bioRxiv. 2024. PMID: 38712040 Free PMC article. Preprint.
-
Epistasis between mutator alleles contributes to germline mutation spectrum variability in laboratory mice.Elife. 2024 Feb 21;12:RP89096. doi: 10.7554/eLife.89096. Elife. 2024. PMID: 38381482 Free PMC article.
-
vcfpp: a C++ API for rapid processing of the variant call format.Bioinformatics. 2024 Feb 1;40(2):btae049. doi: 10.1093/bioinformatics/btae049. Bioinformatics. 2024. PMID: 38273677 Free PMC article.
References
-
- Behnel S. et al. (2011) Cython: the best of both worlds. Comput. Sci. Eng., 13, 31–39.
-
- Van Der Walt S. et al. (2011) The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng., 13, 22–30.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
