Pathogenicity classification of missense mutations based on deep generative model

Comput Biol Med. 2024 Mar:170:107980. doi: 10.1016/j.compbiomed.2024.107980. Epub 2024 Jan 13.

Abstract

Missense mutations affect the function of human proteins and are closely associated with multiple acute and chronic diseases. The identification of disease-associated missense mutations and their classification for pathogenicity can provide insights into the genetic basis of disease and protein function. This paper proposes MLAE (Method based on LSTM-Ladder AutoEncoder), a deep learning classification model for identifying disease-associated missense mutations and classifying their pathogenicity based on the Variational AutoEncoder (VAE) framework. MLAE overcomes the limitations of the VAE framework by introducing the Ladder structure, combined with LSTM networks. This reduces the loss of original information during the transmission process, thereby making the model more effective in learning. In the experiment, MLAE classified all 27572 possible missense variants of the three input proteins with an average classification AUC of 0.941. This result provides evidence that MLAE is effective in predicting pathogenicity. Additionally, MLAE provides results for multi-label classification, with an average Hamming loss of 0.196, supporting the classification of complex variants. The proposed MLAE method provides an insightful approach to effectively capture amino acid sequence information and accurately predict the pathogenicity of mutations, thereby providing an analytical basis for the study and prevention of related diseases.

Keywords: Deep generation model; Multiple label classification; Pathogenicity classification; Single amino acid variation; Variational autoencoder.

MeSH terms

  • Humans
  • Mutation
  • Mutation, Missense*
  • Virulence