NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning

Magnus Haraldson Høie; Erik Nicolas Kiehl; Bent Petersen; Morten Nielsen; Ole Winther; Henrik Nielsen; Jeppe Hallgren; Paolo Marcatili

doi:10.1093/nar/gkac439

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning

Nucleic Acids Res. 2022 Jul 5;50(W1):W510-W515. doi: 10.1093/nar/gkac439.

Authors

Magnus Haraldson Høie¹, Erik Nicolas Kiehl¹, Bent Petersen^{2

3}, Morten Nielsen¹, Ole Winther^{4

5

6}, Henrik Nielsen¹, Jeppe Hallgren⁷, Paolo Marcatili¹

Affiliations

¹ Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark.
² Center for Evolutionary Hologenomics, GLOBE Institute, University of Copenhagen, Denmark.
³ Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia.
⁴ Section for Cognitive Systems, DTU Compute, Technical University of Denmark (DTU), Denmark.
⁵ Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen, Denmark.
⁶ Department of Biology, Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.
⁷ BioLib Technologies, Copenhagen, Denmark.

Abstract

Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Computers
Datasets as Topic
Deep Learning*
Internet
Natural Language Processing*
Protein Structure, Secondary*
Proteins* / chemistry
Proteins* / metabolism
Software
Solvents / chemistry
Time Factors

Substances

Proteins
Solvents