Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques

Prev Vet Med. 2023 Jul:216:105924. doi: 10.1016/j.prevetmed.2023.105924. Epub 2023 Apr 23.

Abstract

Over the past decades, avian influenza (AI) outbreaks have been reported across different parts of the globe, resulting in large-scale economic and livestock loss and, in some cases raising concerns about their zoonotic potential. The virulence and pathogenicity of H5Nx (e.g., H5N1, H5N2) AI strains for poultry could be inferred through various approaches, and it has been frequently performed by detecting certain pathogenicity markers in their haemagglutinin (HA) gene. The utilization of predictive modeling methods represents a possible approach to exploring this genotypic-phenotypic relationship for assisting experts in determining the pathogenicity of circulating AI viruses. Therefore, the main objective of this study was to evaluate the predictive performance of different machine learning (ML) techniques for in-silico prediction of pathogenicity of H5Nx viruses in poultry, using complete genetic sequences of the HA gene. We annotated 2137 H5Nx HA gene sequences based on the presence of the polybasic HA cleavage site (HACS) with 46.33% and 53.67% of sequences previously identified as highly pathogenic (HP) and low pathogenic (LP), respectively. We compared the performance of different ML classifiers (e.g., logistic regression (LR) with the lasso and ridge regularization, random forest (RF), K-nearest neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), and convolutional neural network (CNN)) for pathogenicity classification of raw H5Nx nucleotide and protein sequences using a 10-fold cross-validation technique. We found that different ML techniques can be successfully used for the pathogenicity classification of H5 sequences with ∼99% classification accuracy. Our results indicate that for pathogenicity classification of (1) aligned deoxyribonucleic acid (DNA) and protein sequences, with NB classifier had the lowest accuracies of 98.41% (+/-0.89) and 98.31% (+/-1.06), respectively; (2) aligned DNA and protein sequences, with LR (L1/L2), KNN, SVM (radial basis function (RBF)) and CNN classifiers had the highest accuracies of 99.20% (+/-0.54) and 99.20% (+/-0.38), respectively; (3) unaligned DNA and protein sequences, with CNN's achieved accuracies of 98.54% (+/-0.68) and 99.20% (+/-0.50), respectively. ML methods show potential for regular classification of H5Nx virus pathogenicity for poultry species, particularly when sequences containing regular markers were frequently present in the training dataset.

Keywords: Avian influenza; Classification; Genotype; H5; Machine learning; Markers; Pathogenicity; Tracking; Virulence.

MeSH terms

  • Animals
  • Bayes Theorem
  • Chickens / metabolism
  • DNA
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics
  • Hemagglutinin Glycoproteins, Influenza Virus / metabolism
  • Influenza A Virus, H5N1 Subtype* / genetics
  • Influenza A Virus, H5N2 Subtype*
  • Influenza in Birds* / epidemiology
  • Poultry
  • Virulence

Substances

  • Hemagglutinin Glycoproteins, Influenza Virus
  • DNA