Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55145-55171, 2025.

Abstract

Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shin25d, title = {Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation}, author = {Shin, Dahun and Lee, Dongyeop and Chung, Jinseok and Lee, Namhoon}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {55145--55171}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shin25d/shin25d.pdf}, url = {https://proceedings.mlr.press/v267/shin25d.html}, abstract = {Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.} }
Endnote
%0 Conference Paper %T Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation %A Dahun Shin %A Dongyeop Lee %A Jinseok Chung %A Namhoon Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shin25d %I PMLR %P 55145--55171 %U https://proceedings.mlr.press/v267/shin25d.html %V 267 %X Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.
APA
Shin, D., Lee, D., Chung, J. & Lee, N.. (2025). Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55145-55171 Available from https://proceedings.mlr.press/v267/shin25d.html.

Related Material