Predicting Immune Escape with Pretrained Protein Language Model Embeddings

Kyle Swanson; Howard Chang; James Zou

Predicting Immune Escape with Pretrained Protein Language Model Embeddings

Kyle Swanson, Howard Chang, James Zou

Proceedings of the 17th Machine Learning in Computational Biology meeting, PMLR 200:110-130, 2022.

Abstract

Assessing the severity of new pathogenic variants requires an understanding of which mutations enable escape of the human immune response. Even single point mutations to an antigen can cause immune escape and infection by disrupting antibody binding. Recent work has modeled the effect of single point mutations on proteins by leveraging the information contained in large-scale, pretrained protein language models (PLMs). PLMs are often applied in a zero-shot setting, where the effect of each mutation is predicted based on the output of the language model with no additional training. However, this approach cannot appropriately model immune escape, which involves the interaction of two proteins—antibody and antigen—instead of one protein and requires making different predictions for the same antigenic mutation in response to different antibodies. Here, we explore several methods for predicting immune escape by building models on top of embeddings from PLMs. We evaluate our methods on a SARS-CoV-2 deep mutational scanning dataset and show that our embedding-based methods significantly outperform zero-shot methods, which have almost no predictive power. We also highlight insights gained into how best to use embeddings from PLMs to predict escape. Despite these promising results, simple statistical and machine learning baseline models that do not use pretraining perform comparably, showing that computationally expensive pretraining approaches may not be beneficial for escape prediction. Furthermore, all models perform relatively poorly, indicating that future work is necessary to improve escape prediction with or without pretrained embeddings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v200-swanson22a,
  title = 	 {Predicting Immune Escape with Pretrained Protein Language Model Embeddings},
  author =       {Swanson, Kyle and Chang, Howard and Zou, James},
  booktitle = 	 {Proceedings of the 17th Machine Learning in Computational Biology meeting},
  pages = 	 {110--130},
  year = 	 {2022},
  editor = 	 {Knowles, David A and Mostafavi, Sara and Lee, Su-In},
  volume = 	 {200},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v200/swanson22a/swanson22a.pdf},
  url = 	 {https://proceedings.mlr.press/v200/swanson22a.html},
  abstract = 	 {Assessing the severity of new pathogenic variants requires an understanding of which mutations enable escape of the human immune response. Even single point mutations to an antigen can cause immune escape and infection by disrupting antibody binding. Recent work has modeled the effect of single point mutations on proteins by leveraging the information contained in large-scale, pretrained protein language models (PLMs). PLMs are often applied in a zero-shot setting, where the effect of each mutation is predicted based on the output of the language model with no additional training. However, this approach cannot appropriately model immune escape, which involves the interaction of two proteins—antibody and antigen—instead of one protein and requires making different predictions for the same antigenic mutation in response to different antibodies. Here, we explore several methods for predicting immune escape by building models on top of embeddings from PLMs. We evaluate our methods on a SARS-CoV-2 deep mutational scanning dataset and show that our embedding-based methods significantly outperform zero-shot methods, which have almost no predictive power. We also highlight insights gained into how best to use embeddings from PLMs to predict escape. Despite these promising results, simple statistical and machine learning baseline models that do not use pretraining perform comparably, showing that computationally expensive pretraining approaches may not be beneficial for escape prediction. Furthermore, all models perform relatively poorly, indicating that future work is necessary to improve escape prediction with or without pretrained embeddings.}
}

Endnote

%0 Conference Paper
%T Predicting Immune Escape with Pretrained Protein Language Model Embeddings
%A Kyle Swanson
%A Howard Chang
%A James Zou
%B Proceedings of the 17th Machine Learning in Computational Biology meeting
%C Proceedings of Machine Learning Research
%D 2022
%E David A Knowles
%E Sara Mostafavi
%E Su-In Lee	
%F pmlr-v200-swanson22a
%I PMLR
%P 110--130
%U https://proceedings.mlr.press/v200/swanson22a.html
%V 200
%X Assessing the severity of new pathogenic variants requires an understanding of which mutations enable escape of the human immune response. Even single point mutations to an antigen can cause immune escape and infection by disrupting antibody binding. Recent work has modeled the effect of single point mutations on proteins by leveraging the information contained in large-scale, pretrained protein language models (PLMs). PLMs are often applied in a zero-shot setting, where the effect of each mutation is predicted based on the output of the language model with no additional training. However, this approach cannot appropriately model immune escape, which involves the interaction of two proteins—antibody and antigen—instead of one protein and requires making different predictions for the same antigenic mutation in response to different antibodies. Here, we explore several methods for predicting immune escape by building models on top of embeddings from PLMs. We evaluate our methods on a SARS-CoV-2 deep mutational scanning dataset and show that our embedding-based methods significantly outperform zero-shot methods, which have almost no predictive power. We also highlight insights gained into how best to use embeddings from PLMs to predict escape. Despite these promising results, simple statistical and machine learning baseline models that do not use pretraining perform comparably, showing that computationally expensive pretraining approaches may not be beneficial for escape prediction. Furthermore, all models perform relatively poorly, indicating that future work is necessary to improve escape prediction with or without pretrained embeddings.

APA


Swanson, K., Chang, H. & Zou, J.. (2022). Predicting Immune Escape with Pretrained Protein Language Model Embeddings. Proceedings of the 17th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 200:110-130 Available from https://proceedings.mlr.press/v200/swanson22a.html.

Related Material

Download PDF