Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:12448-12468, 2024.

Abstract

Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and LeCun (1989, BL89), which has a cost similar to gradients and appears to have been overlooked by the community. We introduce HesScale, an improvement over BL89, which adds negligible extra computation. On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute. We use this insight in reinforcement learning problems where small networks are used and demonstrate HesScale in second-order optimization and scaling the step-size parameter. In our experiments, HesScale optimizes faster than existing methods and improves stability through step-size scaling. These findings are promising for scaling second-order methods in larger models in the future.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-elsayed24a, title = {Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning}, author = {Elsayed, Mohamed and Farrahi, Homayoon and Dangel, Felix and Mahmood, A. Rupam}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {12448--12468}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/elsayed24a/elsayed24a.pdf}, url = {https://proceedings.mlr.press/v235/elsayed24a.html}, abstract = {Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and LeCun (1989, BL89), which has a cost similar to gradients and appears to have been overlooked by the community. We introduce HesScale, an improvement over BL89, which adds negligible extra computation. On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute. We use this insight in reinforcement learning problems where small networks are used and demonstrate HesScale in second-order optimization and scaling the step-size parameter. In our experiments, HesScale optimizes faster than existing methods and improves stability through step-size scaling. These findings are promising for scaling second-order methods in larger models in the future.} }
Endnote
%0 Conference Paper %T Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning %A Mohamed Elsayed %A Homayoon Farrahi %A Felix Dangel %A A. Rupam Mahmood %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-elsayed24a %I PMLR %P 12448--12468 %U https://proceedings.mlr.press/v235/elsayed24a.html %V 235 %X Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and LeCun (1989, BL89), which has a cost similar to gradients and appears to have been overlooked by the community. We introduce HesScale, an improvement over BL89, which adds negligible extra computation. On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute. We use this insight in reinforcement learning problems where small networks are used and demonstrate HesScale in second-order optimization and scaling the step-size parameter. In our experiments, HesScale optimizes faster than existing methods and improves stability through step-size scaling. These findings are promising for scaling second-order methods in larger models in the future.
APA
Elsayed, M., Farrahi, H., Dangel, F. & Mahmood, A.R.. (2024). Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:12448-12468 Available from https://proceedings.mlr.press/v235/elsayed24a.html.

Related Material