ARGDIT: a validation and integration toolkit for Antimicrobial Resistance Gene Databases

Bioinformatics. 2019 Jul 15;35(14):2466-2474. doi: 10.1093/bioinformatics/bty987.

Abstract

Motivation: Antimicrobial resistance is currently one of the main challenges in public health due to the excessive use of antimicrobials in medical treatments and agriculture. The advancements in high-throughput next-generation sequencing and development of bioinformatics tools allow simultaneous detection and identification of antimicrobial resistance genes (ARGs) from clinical, food and environment samples, to monitor the prevalence and track the dissemination of these ARGs. Such analyses are however reliant on a comprehensive database of ARGs with accurate sequence content and annotation. Most of the current ARG databases are therefore manually curated, but this is a time-consuming process and the resulting curation errors could be hard to detect. Several secondary ARG databases consolidate contents from different source ARG databases, and hence modifications in the primary databases might not be propagated and updated promptly in the secondary ARG databases.

Results: To address these problems, a validation and integration toolkit called ARGDIT was developed to validate ARG database fidelity, and merge multiple primary ARG databases into a single consolidated secondary ARG database with optional automated sequence re-annotation. Experimental results demonstrated the effectiveness of this toolkit in identifying errors such as sequence annotation typos in current ARG databases and generating an integrated non-redundant ARG database with structured annotation. A toolkit-oriented workflow is also proposed to minimize the efforts in validating, curating and merging multiple ARG protein or coding sequence databases. Database developers therefore benefit from faster update cycles and lower costs for database maintenance, while ARG pipeline users can easily evaluate the reference ARG database quality.

Availability and implementation: ARGDIT is available at https://github.com/phglab/ARGDIT.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Anti-Bacterial Agents
  • Databases, Nucleic Acid
  • Drug Resistance, Bacterial
  • High-Throughput Nucleotide Sequencing
  • Software*

Substances

  • Anti-Bacterial Agents