Protein Fitness Landscape: Spectral Graph Theory Perspective

Hao Zhu, Daniel M. Steinberg, Piotr Koniusz
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2827-2835, 2025.

Abstract

In this work, we present a novel theoretical framework for analyzing and modeling protein fitness landscapes using spectral graph theory. By representing the protein sequence space as a generalized Hamming graph and studying its spectral properties, we derive a set of powerful tools for quantifying the ruggedness, epistasis, and other key characteristics of the landscape. We prove strong approximation and sampling results, showing that the landscape can be efficiently learned and optimized from limited and noisy data. Building on this foundation, we introduce Propagational Convolutional Neural Networks (PCNNs), a new class of inductive surrogate oracle. We provide rigorous theoretical guarantees on the generalization and convergence properties of PCNNs, using techniques from the Neural Tangent Kernel framework. Extensive experiments on real-world protein engineering tasks demonstrate the superiority of PCNNs over state-of-the-art methods, achieving higher fitness and better generalization from limited data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-zhu25c, title = {Protein Fitness Landscape: Spectral Graph Theory Perspective}, author = {Zhu, Hao and Steinberg, Daniel M. and Koniusz, Piotr}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2827--2835}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/zhu25c/zhu25c.pdf}, url = {https://proceedings.mlr.press/v258/zhu25c.html}, abstract = {In this work, we present a novel theoretical framework for analyzing and modeling protein fitness landscapes using spectral graph theory. By representing the protein sequence space as a generalized Hamming graph and studying its spectral properties, we derive a set of powerful tools for quantifying the ruggedness, epistasis, and other key characteristics of the landscape. We prove strong approximation and sampling results, showing that the landscape can be efficiently learned and optimized from limited and noisy data. Building on this foundation, we introduce Propagational Convolutional Neural Networks (PCNNs), a new class of inductive surrogate oracle. We provide rigorous theoretical guarantees on the generalization and convergence properties of PCNNs, using techniques from the Neural Tangent Kernel framework. Extensive experiments on real-world protein engineering tasks demonstrate the superiority of PCNNs over state-of-the-art methods, achieving higher fitness and better generalization from limited data.} }
Endnote
%0 Conference Paper %T Protein Fitness Landscape: Spectral Graph Theory Perspective %A Hao Zhu %A Daniel M. Steinberg %A Piotr Koniusz %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-zhu25c %I PMLR %P 2827--2835 %U https://proceedings.mlr.press/v258/zhu25c.html %V 258 %X In this work, we present a novel theoretical framework for analyzing and modeling protein fitness landscapes using spectral graph theory. By representing the protein sequence space as a generalized Hamming graph and studying its spectral properties, we derive a set of powerful tools for quantifying the ruggedness, epistasis, and other key characteristics of the landscape. We prove strong approximation and sampling results, showing that the landscape can be efficiently learned and optimized from limited and noisy data. Building on this foundation, we introduce Propagational Convolutional Neural Networks (PCNNs), a new class of inductive surrogate oracle. We provide rigorous theoretical guarantees on the generalization and convergence properties of PCNNs, using techniques from the Neural Tangent Kernel framework. Extensive experiments on real-world protein engineering tasks demonstrate the superiority of PCNNs over state-of-the-art methods, achieving higher fitness and better generalization from limited data.
APA
Zhu, H., Steinberg, D.M. & Koniusz, P.. (2025). Protein Fitness Landscape: Spectral Graph Theory Perspective. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2827-2835 Available from https://proceedings.mlr.press/v258/zhu25c.html.

Related Material