ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables

Aleksandar Dimitriev, Mingyuan Zhou
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2717-2727, 2021.

Abstract

Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Multi-Sample gradient estimator. ARMS uses a copula to generate any number of mutually antithetic samples. It is unbiased, has low variance, and generalizes both DisARM, which we show to be ARMS with two samples, and the leave-one-out REINFORCE (LOORF) estimator, which is ARMS with uncorrelated samples. We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. We also develop a version of ARMS for optimizing the multi-sample variational bound, and show that it outperforms both VIMCO and DisARM. The code is publicly available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-dimitriev21a, title = {ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables}, author = {Dimitriev, Aleksandar and Zhou, Mingyuan}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2717--2727}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/dimitriev21a/dimitriev21a.pdf}, url = {https://proceedings.mlr.press/v139/dimitriev21a.html}, abstract = {Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Multi-Sample gradient estimator. ARMS uses a copula to generate any number of mutually antithetic samples. It is unbiased, has low variance, and generalizes both DisARM, which we show to be ARMS with two samples, and the leave-one-out REINFORCE (LOORF) estimator, which is ARMS with uncorrelated samples. We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. We also develop a version of ARMS for optimizing the multi-sample variational bound, and show that it outperforms both VIMCO and DisARM. The code is publicly available.} }
Endnote
%0 Conference Paper %T ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables %A Aleksandar Dimitriev %A Mingyuan Zhou %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-dimitriev21a %I PMLR %P 2717--2727 %U https://proceedings.mlr.press/v139/dimitriev21a.html %V 139 %X Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Multi-Sample gradient estimator. ARMS uses a copula to generate any number of mutually antithetic samples. It is unbiased, has low variance, and generalizes both DisARM, which we show to be ARMS with two samples, and the leave-one-out REINFORCE (LOORF) estimator, which is ARMS with uncorrelated samples. We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. We also develop a version of ARMS for optimizing the multi-sample variational bound, and show that it outperforms both VIMCO and DisARM. The code is publicly available.
APA
Dimitriev, A. & Zhou, M.. (2021). ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2717-2727 Available from https://proceedings.mlr.press/v139/dimitriev21a.html.

Related Material