Morbidity and mortality caused by infectious diseases rank first among all human illnesses. Many pathogenic mechanisms remain unclear, while misuse of antibiotics has led to the emergence of drug-resistant strains. Infectious diseases spread rapidly and pathogens mutate quickly, posing new threats to human health. However, with the increasing use of high-throughput screening of pathogen genomes, research based on big data mining and visualization analysis has gradually become a hot topic for studies of infectious disease prevention and control. In this paper, the framework was performed on four infectious pathogens (Fusobacterium, Streptococcus, Neisseria, and Streptococcus salivarius) through five functions: 1) genome annotation, 2) phylogeny analysis based on core genome, 3) analysis of structure differences between genomes, 4) prediction of virulence genes/factors with their pathogenic mechanisms, and 5) prediction of resistance genes/factors with their signaling pathways. The experiments were carried out from three angles: phylogeny (macro perspective), structure differences of genomes (micro perspective), and virulence and drug-resistance characteristics (prediction perspective). Therefore, the framework can not only provide evidence to support the rapid identification of new or unknown pathogens and thus plays a role in the prevention and control of infectious diseases, but also help to recommend the most appropriate strains for clinical and scientific research. This paper presented a new genome information visualization analysis process framework based on big data mining technology with the accommodation of the depth and breadth of pathogens in molecular level research.
Keywords: big data mining; drug-resistance; genome analysis; pathogen identification; virulence; visualization.
Copyright © 2021 Chen, Sun, Hou, Du, Wang, Du, Yu, Cai, Yu, Li, Luo, Shen, Lu, Li, Zhang, Gao, Ma, Lin and Cao.