

Folding Stability
1 Introduction
Prediction of absolute protein stability ΔG by protein sequence inverse folding model ESM-IF.
Traditional physical methods (e.g., FoldX, Rosetta, etc.) for predicting protein stability ΔG rely on high-confidence structural pdb, and if there are too many mutations, the structural confidence decreases and the prediction results are poor. Benchmark results at ProteinGym show that the generative model ESM-IF predicts protein mutation stability ΔΔG of DMS data at best-in-class level in zero-shot. The method is an extension of mutation prediction by using the ESM-IF model to directly predict the absolute ΔG value of intact protein folding stability.
It was tested with a prediction error RMSE ≈ 1.5 kcal/mol and a correlation coefficient of 0.7, representing a major breakthrough in predicting the folding stability ΔΔG of proteins.1
Principle

Figure 1. Computational principles.

Figure 2. Performace.
- xk : log-likelihood calculated using ESM-IF when the protein is at amino acid k at a particular site.
- xj : log-likelihood computed using ESM-IF when the protein traverses 20 amino acids when the site is j.
- Lk : Softmax gets the size of the contribution to stability when a site of the protein is amino acid k.
Then, the log-likelihood of the protein as a whole is obtained by summing Lk for all amino acid sites of the protein.
Finally, the fitting parameter is obtained by fitting the linear overall log-likelihood to the experimental stability ΔG, and the log-likelihood can be converted to protein stability ΔG according to a/b.
2 Parameters
- PDB File: PDB file of protein 3D structure.
- Target Chain: The target Chain.
3 Results Explanation
- Protein Stability: higher values are better.
4 Reference
[1] Predicting absolute protein folding stability using generative models Matteo Cagiada, Sergey Ovchinnikov, Kresten Lindorff-Larsen bioRxiv 2024.03.14.584940. https://doi.org/10.1101/2024.03.14.584940

