SRA Down Under: Cache and Analysis Platform for Infectious Disease

Stud Health Technol Inform. 2019 Aug 8:266:76-82. doi: 10.3233/SHTI190776.

Abstract

SRA, NCBI's Sequence Read Archive, is a valuable resource holding a near definitive collection of the world's collective sequenced reads for academic purposes. Increasingly, these reads are being used for both basic research and clinical investigations. When time is a critical factor in analysis, such as during bacterial outbreaks, the geographical separation between Australia and the offshore NCBI SRA servers can result in significant delays that may have adverse clinical outcomes. To address this, Queensland Genomics commissioned a pilot program for the establishment of a local Australian SRA Cache. Utilizing the hosting capabilities of the NeCTAR Research Cloud, QRIScloud's HTC infrastructure and the MeDiCI data fabric as a storage solution, and the software stack of Cromwell for workflow management, PostgreSQL database for sample and job metadata, and a coordinator Python Flask application, a local cache of seventeen bacterial species was established. Furthermore, the workflow capabilities of Cromwell were leveraged to provide analysis solutions for cached sample data, including quality control and taxonomic profiling, and individual and multiple sample analysis. Moving forward to a broader rollout of increased bacterial species, it was found that the initial storage estimation did not keep up with the exponential increase sequencing reads uploaded to NCBI SRA, which while highlighting the increasing availability and importance in modern research, will need to be addressed.

Keywords: Cromwell; MeDiCI; QRIScloud; Queensland Genomics; SRA.

MeSH terms

  • Australia
  • Databases, Genetic*
  • Genomics
  • Queensland
  • Software*