vcf2gwas: Python API for comprehensive GWAS analysis using GEMMA

Bioinformatics. 2022 Jan 12;38(3):839-840. doi: 10.1093/bioinformatics/btab710.


Motivation: Genome-wide association study (GWAS) requires a researcher to perform a multitude of different actions during analysis. From editing and formatting genotype and phenotype information to running the analysis software to summarizing and visualizing the results. A typical GWAS workflow poses a significant challenge of utilizing the command-line, manual text-editing and requiring knowledge of one or more programming/scripting languages, especially for newcomers.

Results: vcf2gwas is a package that provides a convenient pipeline to perform all of the steps of a traditional GWAS workflow by reducing it to a single command-line input of a Variant Call Format file and a phenotype data file. In addition, all the required software is installed with the package. vcf2gwas also implements several useful features enhancing the reproducibility of GWAS analysis.

Availability and implementation: The source code of vcf2gwas is available under the GNU General Public License. The package can be easily installed using conda. Installation instructions and a manual including tutorials can be accessed on the package website at

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome-Wide Association Study*
  • Genotype
  • Programming Languages
  • Reproducibility of Results
  • Software*

Grant support