Heterogeneity of the GFP fitness landscape and data-driven protein design

Elife. 2022 May 5;11:e75842. doi: 10.7554/eLife.75842.

Abstract

Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design - instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.

Keywords: E. coli; GFP; computational biology; evolutionary biology; fitness landscape; machine learning; molecular evolution; protein engineering; systems biology.