Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

Philipp Seidl, Andreu Vall, Sepp Hochreiter, Günter Klambauer
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:30458-30490, 2023.

Abstract

Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently, they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pretraining objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-seidl23a, title = {Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language}, author = {Seidl, Philipp and Vall, Andreu and Hochreiter, Sepp and Klambauer, G\"{u}nter}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {30458--30490}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/seidl23a/seidl23a.pdf}, url = {https://proceedings.mlr.press/v202/seidl23a.html}, abstract = {Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently, they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pretraining objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.} }
Endnote
%0 Conference Paper %T Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language %A Philipp Seidl %A Andreu Vall %A Sepp Hochreiter %A Günter Klambauer %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-seidl23a %I PMLR %P 30458--30490 %U https://proceedings.mlr.press/v202/seidl23a.html %V 202 %X Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently, they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pretraining objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.
APA
Seidl, P., Vall, A., Hochreiter, S. & Klambauer, G.. (2023). Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:30458-30490 Available from https://proceedings.mlr.press/v202/seidl23a.html.

Related Material