Biomarker identification from next-generation sequencing data for pathogen bacteria characterization and surveillance

Biomark Med. 2015;9(11):1253-64. doi: 10.2217/bmm.15.88. Epub 2015 Oct 26.

Abstract

Aim: The purpose was to develop an analytical pipeline for specific gene analysis and biomarker discovery from next generation sequencing (NGS) data.

Materials & methods: As a test case, the fliC gene reference sequences of 24 Salmonella enterica strains of 13 serotypes and NGS reads of 32 serovar Newport, 48 Montevideo and 115 Enteritidis outbreak isolates were retrieved from the National Center for Biotechnology Information database.

Results: Establishment of an analytical pipeline consisting of four steps: reference sequences retrieval and template sequence determination; NGS sequence reads retrieval; multiple sequence alignments and phylogenetic analysis; data mining and biomarker discovery.

Conclusion: The pipeline developed provides an effective bioinformatics tool for genetic diversity clarification and marker sequences discovery for pathogen characterization and surveillance.

Keywords: Salmonella serotypes; bioinformatics pipeline; biomarker; fliC gene; gene diversity; next-generation sequencing analysis.

MeSH terms

  • Bacterial Proteins / genetics
  • Biomarkers / metabolism*
  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Phylogeny
  • Salmonella enterica / genetics
  • Salmonella enterica / isolation & purification*
  • Salmonella enterica / metabolism

Substances

  • Bacterial Proteins
  • Biomarkers