Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data

Nat Methods. 2024 Feb;21(2):279-289. doi: 10.1038/s41592-023-02130-4. Epub 2024 Jan 2.

Abstract

Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Deep Learning*
  • Genomics
  • Proteins / chemistry
  • Proteins / genetics
  • Sequence Alignment

Substances

  • Proteins