Prevention and Control of Pathogens Based on Big-Data Mining and Visualization Analysis

Cui-Xia Chen; Li-Na Sun; Xue-Xin Hou; Peng-Cheng Du; Xiao-Long Wang; Xiao-Chen Du; Yu-Fei Yu; Rui-Kun Cai; Lei Yu; Tian-Jun Li; Min-Na Luo; Yue Shen; Chao Lu; Qian Li; Chuan Zhang; Hua-Fang Gao; Xu Ma; Hao Lin; Zong-Fu Cao

doi:10.3389/fmolb.2020.626595

Prevention and Control of Pathogens Based on Big-Data Mining and Visualization Analysis

Front Mol Biosci. 2021 Feb 25:7:626595. doi: 10.3389/fmolb.2020.626595. eCollection 2020.

Authors

Cui-Xia Chen^{1

2}, Li-Na Sun³, Xue-Xin Hou³, Peng-Cheng Du⁴, Xiao-Long Wang⁵, Xiao-Chen Du⁶, Yu-Fei Yu^{1

2}, Rui-Kun Cai^{1

2}, Lei Yu^{1

2}, Tian-Jun Li^{1

2}, Min-Na Luo^{1

2}, Yue Shen^{1

2}, Chao Lu^{1

2}, Qian Li^{1

2}, Chuan Zhang^{1

2}, Hua-Fang Gao^{1

2}, Xu Ma^{1

2}, Hao Lin⁷, Zong-Fu Cao^{1

2}

Affiliations

¹ National Research Institute for Family Planning, Beijing, China.
² National Center of Human Genetic Resources, Beijing, China.
³ National Institute for Communicable Disease Control and Prevention, Beijing, China.
⁴ Bejing Ditan Hospital, Beijing, China.
⁵ Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.
⁶ Shanghai Jiaotong University School of Medicine, Shanghai, China.
⁷ Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.

Abstract

Morbidity and mortality caused by infectious diseases rank first among all human illnesses. Many pathogenic mechanisms remain unclear, while misuse of antibiotics has led to the emergence of drug-resistant strains. Infectious diseases spread rapidly and pathogens mutate quickly, posing new threats to human health. However, with the increasing use of high-throughput screening of pathogen genomes, research based on big data mining and visualization analysis has gradually become a hot topic for studies of infectious disease prevention and control. In this paper, the framework was performed on four infectious pathogens (Fusobacterium, Streptococcus, Neisseria, and Streptococcus salivarius) through five functions: 1) genome annotation, 2) phylogeny analysis based on core genome, 3) analysis of structure differences between genomes, 4) prediction of virulence genes/factors with their pathogenic mechanisms, and 5) prediction of resistance genes/factors with their signaling pathways. The experiments were carried out from three angles: phylogeny (macro perspective), structure differences of genomes (micro perspective), and virulence and drug-resistance characteristics (prediction perspective). Therefore, the framework can not only provide evidence to support the rapid identification of new or unknown pathogens and thus plays a role in the prevention and control of infectious diseases, but also help to recommend the most appropriate strains for clinical and scientific research. This paper presented a new genome information visualization analysis process framework based on big data mining technology with the accommodation of the depth and breadth of pathogens in molecular level research.

Keywords: big data mining; drug-resistance; genome analysis; pathogen identification; virulence; visualization.