Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 25;10(1):3313.
doi: 10.1038/s41467-019-11306-6.

FDA-ARGOS Is a Database With Public Quality-Controlled Reference Genomes for Diagnostic Use and Regulatory Science

Affiliations
Free PMC article

FDA-ARGOS Is a Database With Public Quality-Controlled Reference Genomes for Diagnostic Use and Regulatory Science

Heike Sichtig et al. Nat Commun. .
Free PMC article

Abstract

FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Proposed composite reference method (C-RM) for ID-NGS diagnostics. Panel a illustrates a walkthrough of the C-RM. Here, we show in silico target sequence comparison with FDA-ARGOS reference genomes in combination with representative clinical testing to understand the performance of ID-NGS diagnostic tests. Using raw sequence data from the ID-NGS diagnostic test device, in silico comparison of results obtained with the assay in-house database to results when using FDA-ARGOS will evaluate device bioinformatic analysis pipelines and report generation while eliminating the need for additional sample testing with a gold standard comparator (current FDA benchmarks). Overall, we anticipate the use of the C-RM based on assay-specific subsets of clinical samples and/or microbial reference materials (MRMs) for clinical validation in combination with FDA-ARGOS in silico target sequence comparison to generate scientifically valid evidence for understanding the performance of ID NGS diagnostic tests. Panel b lists the required quality control metrics for passing the regulatory-grade reference genome criteria. At a minimum, an FDA-ARGOS regulatory-grade reference genome adheres to six metrics (a–f). Specifically, category f details the minimum data requirements that are further described in (c). In addition, panel d lists the 10 critical metadata that need to be ascribed to a genome to meet the regulatory-grade criteria
Fig. 2
Fig. 2
FDA-ARGOS quality-controlled reference genomes for diagnostic use. Summary statistics of the current 487 microbial genomes show primary coverage of FDA-ARGOS resides with bacterial isolates, followed by viruses and then eukaryotic parasites (a). Supplementary Data 1 provides accessions for all 487 genomes currently available publicly. A majority of FDA-ARGOS constituents (b) originate from North America and are from human clinical isolation
Fig. 3
Fig. 3
FDA-ARGOS reference genome assemblies quality metrics. Comparative microbial genome assembly quality metrics contrasted current FDA-ARGOS assemblies to 2013 and 2018 NCBI GenBank assemblies submitted for each species captured within the FDA-ARGOS database. Assembly quality metrics measured included: (a) median coverage, (b) median N50, (c) median L50, and (d) number of 2018 NCBI genomes that exhibited all, one or a specific quality control metric used to vet FDA-ARGOS genomes for inclusion. The NCBI assemblies were downloaded on August 6, 2018. For each box plot the center line represents the median value and is bounded by the 25th and 75th percentiles. The whiskers represent the min and max values
Fig. 4
Fig. 4
Comparison of NCBI Nt and FDA-ARGOS read classification results. Visualizing bioinformatics analysis with the MegaBLAST tool of metagenomics shotgun data of mock clinical human blood sample spiked with 105 E. avium. The heatmap showed read classification results for triplicate samples run against 200 database instances. Dark blue indicates read numbers below 10. A gradient from white to red indicates read numbers ranging from above 10 to 100,000. Here we demonstrated read classification results for all simulated species. E. avium classification results were consistent across all database instances. In addition, several other species were classified at >1000 reads with the normalized NCBI Nt database instances (Supplementary Data 3 and 4)

Similar articles

See all similar articles

Cited by 4 articles

References

    1. Wilson MR, et al. Acute West Nile virus meningoencephalitis diagnosed via metagenomic deep sequencing of cerebrospinal fluid in a renal transplant patient. Am. J. Transplant. 2017;17:803–808. doi: 10.1111/ajt.14058. - DOI - PMC - PubMed
    1. Schlaberg Robert, Chiu Charles Y., Miller Steve, Procop Gary W., Weinstock George. Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection. Archives of Pathology & Laboratory Medicine. 2017;141(6):776–786. doi: 10.5858/arpa.2016-0539-RA. - DOI - PubMed
    1. Snitkin Evan S., Won Sarah, Pirani Ali, Lapp Zena, Weinstein Robert A., Lolans Karen, Hayden Mary K. Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak. Science Translational Medicine. 2017;9(417):eaan0093. doi: 10.1126/scitranslmed.aan0093. - DOI - PubMed
    1. Snitkin ES, et al. Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci. Transl. Med. 2012;4:148ra116. doi: 10.1126/scitranslmed.3004129. - DOI - PMC - PubMed
    1. Roach DJ, et al. A year of infection in the intensive care unit: prospective whole genome sequencing of bacterial clinical isolates reveals cryptic transmissions and novel microbiota. PLoS Genet. 2015;11:e1005413. doi: 10.1371/journal.pgen.1005413. - DOI - PMC - PubMed

Publication types

MeSH terms

Feedback