The Cano-eMLST Program: An Approach for the Calculation of Canonical Extended Multi-Locus Sequence Typing, Making Comparison of Genetic Differences Among Bunches of Bacterial Strains

Yen-Yi Liu; Ji-Wei Lin; Chih-Chieh Chen

doi:10.3390/microorganisms7040098

The Cano-eMLST Program: An Approach for the Calculation of Canonical Extended Multi-Locus Sequence Typing, Making Comparison of Genetic Differences Among Bunches of Bacterial Strains

Microorganisms. 2019 Apr 3;7(4):98. doi: 10.3390/microorganisms7040098.

Authors

Yen-Yi Liu¹, Ji-Wei Lin², Chih-Chieh Chen^{3

4

5}

Affiliations

¹ Central Regional Laboratory, Center for Diagnostics and Vaccine Development, Centers for Disease Control, Taichung 40855, Taiwan. current788@gmail.com.
² Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 80424, Taiwan. jwlin@imst.nsysu.edu.tw.
³ Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 80424, Taiwan. chieh@imst.nsysu.edu.tw.
⁴ Rapid Screening Research Center for Toxicology and Biomedicine, National Sun Yat-sen University, Kaohsiung 80424, Taiwan. chieh@imst.nsysu.edu.tw.
⁵ General Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan. chieh@imst.nsysu.edu.tw.

Abstract

Extended multi-locus sequence typing (eMLST) methods have become popular in the field of genomic epidemiology. Before eMLST methods can be applied in epidemiological investigations, the selection of a suitable scheme is critical. The core genome scheme (cgMLST) has become the most popular eMLST approach for strain typing in the epidemiological domain. In addition to strain typing, many public health researchers and clinical microbiologists wish to investigate which genes cause genetic differences between compared strains. Therefore, a tool that can be used to extract canonical genes with an eMLST scheme would be particularly useful. In this study, we present cano-eMLST, a well-designed program that applies a feature-selection methodology to create a canonical locus combination with discriminatory power by traversing a genetic relatedness tree based on a user-selected scheme. The cano-eMLST program is provided mainly to help infectious disease laboratory researchers identify potential factors related to bacterial pathogenesis. The core program (tree-traversing approach) of cano-eMLST is implemented in Perl and Python. All the necessary dependencies and environmental settings are provided in the encapsulated version (VirtualBox or VMware) and self-installation version (all use source code and libraries).

Keywords: core-genome multi-locus sequence typing (cgMLST); feature-selection; molecular typing; next-generation sequencing (NGS).

Grants and funding

MOST 107-2311-B-110-001/Ministry of Science and Technology, Taiwan