Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

Leo Klarner; Tim G. J. Rudner; Michael Reutlinger; Torsten Schindler; Garrett M Morris; Charlotte Deane; Yee Whye Teh

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M Morris, Charlotte Deane, Yee Whye Teh

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17176-17197, 2023.

Abstract

Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-klarner23a,
  title = 	 {Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions},
  author =       {Klarner, Leo and Rudner, Tim G. J. and Reutlinger, Michael and Schindler, Torsten and Morris, Garrett M and Deane, Charlotte and Teh, Yee Whye},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {17176--17197},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/klarner23a/klarner23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/klarner23a.html},
  abstract = 	 {Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.}
}

Endnote

%0 Conference Paper
%T Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
%A Leo Klarner
%A Tim G. J. Rudner
%A Michael Reutlinger
%A Torsten Schindler
%A Garrett M Morris
%A Charlotte Deane
%A Yee Whye Teh
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-klarner23a
%I PMLR
%P 17176--17197
%U https://proceedings.mlr.press/v202/klarner23a.html
%V 202
%X Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.

APA

Klarner, L., Rudner, T.G.J., Reutlinger, M., Schindler, T., Morris, G.M., Deane, C. & Teh, Y.W.. (2023). Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17176-17197 Available from https://proceedings.mlr.press/v202/klarner23a.html.

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

Abstract

Cite this Paper

Related Material