Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model

Christian B Thygesen, Christian Skjødt Steenmans, Ahmad Salim Al-Sibahi, Lys Sanz Moreta, Anders Bundgård Sørensen, Thomas Hamelryck
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10258-10267, 2021.

Abstract

Fragment libraries are often used in protein structure prediction, simulation and design as a means to significantly reduce the vast conformational search space. Current state-of-the-art methods for fragment library generation do not properly account for aleatory and epistemic uncertainty, respectively due to the dynamic nature of proteins and experimental errors in protein structures. Additionally, they typically rely on information that is not generally or readily available, such as homologous sequences, related protein structures and other complementary information. To address these issues, we developed BIFROST, a novel take on the fragment library problem based on a Deep Markov Model architecture combined with directional statistics for angular degrees of freedom, implemented in the deep probabilistic programming language Pyro. BIFROST is a probabilistic, generative model of the protein backbone dihedral angles conditioned solely on the amino acid sequence. BIFROST generates fragment libraries with a quality on par with current state-of-the-art methods at a fraction of the run-time, while requiring considerably less information and allowing efficient evaluation of probabilities.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-thygesen21a, title = {Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model}, author = {Thygesen, Christian B and Steenmans, Christian Skj{\o}dt and Al-Sibahi, Ahmad Salim and Moreta, Lys Sanz and S{\o}rensen, Anders Bundg{\aa}rd and Hamelryck, Thomas}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {10258--10267}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/thygesen21a/thygesen21a.pdf}, url = {https://proceedings.mlr.press/v139/thygesen21a.html}, abstract = {Fragment libraries are often used in protein structure prediction, simulation and design as a means to significantly reduce the vast conformational search space. Current state-of-the-art methods for fragment library generation do not properly account for aleatory and epistemic uncertainty, respectively due to the dynamic nature of proteins and experimental errors in protein structures. Additionally, they typically rely on information that is not generally or readily available, such as homologous sequences, related protein structures and other complementary information. To address these issues, we developed BIFROST, a novel take on the fragment library problem based on a Deep Markov Model architecture combined with directional statistics for angular degrees of freedom, implemented in the deep probabilistic programming language Pyro. BIFROST is a probabilistic, generative model of the protein backbone dihedral angles conditioned solely on the amino acid sequence. BIFROST generates fragment libraries with a quality on par with current state-of-the-art methods at a fraction of the run-time, while requiring considerably less information and allowing efficient evaluation of probabilities.} }
Endnote
%0 Conference Paper %T Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model %A Christian B Thygesen %A Christian Skjødt Steenmans %A Ahmad Salim Al-Sibahi %A Lys Sanz Moreta %A Anders Bundgård Sørensen %A Thomas Hamelryck %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-thygesen21a %I PMLR %P 10258--10267 %U https://proceedings.mlr.press/v139/thygesen21a.html %V 139 %X Fragment libraries are often used in protein structure prediction, simulation and design as a means to significantly reduce the vast conformational search space. Current state-of-the-art methods for fragment library generation do not properly account for aleatory and epistemic uncertainty, respectively due to the dynamic nature of proteins and experimental errors in protein structures. Additionally, they typically rely on information that is not generally or readily available, such as homologous sequences, related protein structures and other complementary information. To address these issues, we developed BIFROST, a novel take on the fragment library problem based on a Deep Markov Model architecture combined with directional statistics for angular degrees of freedom, implemented in the deep probabilistic programming language Pyro. BIFROST is a probabilistic, generative model of the protein backbone dihedral angles conditioned solely on the amino acid sequence. BIFROST generates fragment libraries with a quality on par with current state-of-the-art methods at a fraction of the run-time, while requiring considerably less information and allowing efficient evaluation of probabilities.
APA
Thygesen, C.B., Steenmans, C.S., Al-Sibahi, A.S., Moreta, L.S., Sørensen, A.B. & Hamelryck, T.. (2021). Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10258-10267 Available from https://proceedings.mlr.press/v139/thygesen21a.html.

Related Material