Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies

Hiroshi Kajino, Kohei Miyaguchi, Takayuki Osogami
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:15567-15585, 2023.

Abstract

We are interested in an evaluation methodology for molecular optimization. Given a sample of molecules and their properties of our interest, we wish not only to train a generator of molecules optimized with respect to a target property but also to evaluate its performance accurately. A common practice is to train a predictor of the target property using the sample and apply it to both training and evaluating the generator. However, little is known about its statistical properties, and thus, we are not certain about whether this performance estimate is reliable or not. We theoretically investigate this evaluation methodology and show that it potentially suffers from two biases; one is due to misspecification of the predictor and the other to reusing the same finite sample for training and evaluation. We discuss bias reduction methods for each of the biases, and empirically investigate their effectiveness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-kajino23a, title = {Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies}, author = {Kajino, Hiroshi and Miyaguchi, Kohei and Osogami, Takayuki}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {15567--15585}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/kajino23a/kajino23a.pdf}, url = {https://proceedings.mlr.press/v202/kajino23a.html}, abstract = {We are interested in an evaluation methodology for molecular optimization. Given a sample of molecules and their properties of our interest, we wish not only to train a generator of molecules optimized with respect to a target property but also to evaluate its performance accurately. A common practice is to train a predictor of the target property using the sample and apply it to both training and evaluating the generator. However, little is known about its statistical properties, and thus, we are not certain about whether this performance estimate is reliable or not. We theoretically investigate this evaluation methodology and show that it potentially suffers from two biases; one is due to misspecification of the predictor and the other to reusing the same finite sample for training and evaluation. We discuss bias reduction methods for each of the biases, and empirically investigate their effectiveness.} }
Endnote
%0 Conference Paper %T Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies %A Hiroshi Kajino %A Kohei Miyaguchi %A Takayuki Osogami %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-kajino23a %I PMLR %P 15567--15585 %U https://proceedings.mlr.press/v202/kajino23a.html %V 202 %X We are interested in an evaluation methodology for molecular optimization. Given a sample of molecules and their properties of our interest, we wish not only to train a generator of molecules optimized with respect to a target property but also to evaluate its performance accurately. A common practice is to train a predictor of the target property using the sample and apply it to both training and evaluating the generator. However, little is known about its statistical properties, and thus, we are not certain about whether this performance estimate is reliable or not. We theoretically investigate this evaluation methodology and show that it potentially suffers from two biases; one is due to misspecification of the predictor and the other to reusing the same finite sample for training and evaluation. We discuss bias reduction methods for each of the biases, and empirically investigate their effectiveness.
APA
Kajino, H., Miyaguchi, K. & Osogami, T.. (2023). Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:15567-15585 Available from https://proceedings.mlr.press/v202/kajino23a.html.

Related Material