LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Anselm Paulus, Georg Martius, Vı́t Musil
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:39989-40014, 2024.

Abstract

Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD), a flexible framework for training architectures with embedded optimization layers that seamlessly integrates into automatic differentiation libraries. LPGD efficiently computes meaningful replacements of the degenerate optimization layer derivatives by re-running the forward solver oracle on a perturbed input. LPGD captures various previously proposed methods as special cases, while fostering deep links to traditional optimization methods. We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-paulus24a, title = {{LPGD}: A General Framework for Backpropagation through Embedded Optimization Layers}, author = {Paulus, Anselm and Martius, Georg and Musil, V\'{\i}t}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {39989--40014}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/paulus24a/paulus24a.pdf}, url = {https://proceedings.mlr.press/v235/paulus24a.html}, abstract = {Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD), a flexible framework for training architectures with embedded optimization layers that seamlessly integrates into automatic differentiation libraries. LPGD efficiently computes meaningful replacements of the degenerate optimization layer derivatives by re-running the forward solver oracle on a perturbed input. LPGD captures various previously proposed methods as special cases, while fostering deep links to traditional optimization methods. We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup.} }
Endnote
%0 Conference Paper %T LPGD: A General Framework for Backpropagation through Embedded Optimization Layers %A Anselm Paulus %A Georg Martius %A Vı́t Musil %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-paulus24a %I PMLR %P 39989--40014 %U https://proceedings.mlr.press/v235/paulus24a.html %V 235 %X Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD), a flexible framework for training architectures with embedded optimization layers that seamlessly integrates into automatic differentiation libraries. LPGD efficiently computes meaningful replacements of the degenerate optimization layer derivatives by re-running the forward solver oracle on a perturbed input. LPGD captures various previously proposed methods as special cases, while fostering deep links to traditional optimization methods. We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup.
APA
Paulus, A., Martius, G. & Musil, V.. (2024). LPGD: A General Framework for Backpropagation through Embedded Optimization Layers. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:39989-40014 Available from https://proceedings.mlr.press/v235/paulus24a.html.

Related Material