The Differentiable Cross-Entropy Method

Brandon Amos, Denis Yarats
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:291-302, 2020.

Abstract

We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the objective function’s parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show how to embed optimal action sequences into a lower-dimensional space. This enables us to use policy optimization to fine-tune modeling components by differentiating through the CEM-based controller.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-amos20a, title = {The Differentiable Cross-Entropy Method}, author = {Amos, Brandon and Yarats, Denis}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {291--302}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/amos20a/amos20a.pdf}, url = { http://proceedings.mlr.press/v119/amos20a.html }, abstract = {We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the objective function’s parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show how to embed optimal action sequences into a lower-dimensional space. This enables us to use policy optimization to fine-tune modeling components by differentiating through the CEM-based controller.} }
Endnote
%0 Conference Paper %T The Differentiable Cross-Entropy Method %A Brandon Amos %A Denis Yarats %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-amos20a %I PMLR %P 291--302 %U http://proceedings.mlr.press/v119/amos20a.html %V 119 %X We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the objective function’s parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show how to embed optimal action sequences into a lower-dimensional space. This enables us to use policy optimization to fine-tune modeling components by differentiating through the CEM-based controller.
APA
Amos, B. & Yarats, D.. (2020). The Differentiable Cross-Entropy Method. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:291-302 Available from http://proceedings.mlr.press/v119/amos20a.html .

Related Material