Improving Sharpness-Aware Minimization by Lookahead

Runsheng Yu, Youzhi Zhang, James Kwok
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:57776-57802, 2024.

Abstract

Sharpness-Aware Minimization (SAM), which performs gradient descent on adversarially perturbed weights, can improve generalization by identifying flatter minima. However, recent studies have shown that SAM may suffer from convergence instability and oscillate around saddle points, resulting in slow convergence and inferior performance. To address this problem, we propose the use of a lookahead mechanism to gather more information about the landscape by looking further ahead, and thus find a better trajectory to converge. By examining the nature of SAM, we simplify the extrapolation procedure, resulting in a more efficient algorithm. Theoretical results show that the proposed method converges to a stationary point and is less prone to saddle points. Experiments on standard benchmark datasets also verify that the proposed method outperforms the SOTAs, and converge more effectively to flat minima.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yu24q, title = {Improving Sharpness-Aware Minimization by Lookahead}, author = {Yu, Runsheng and Zhang, Youzhi and Kwok, James}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {57776--57802}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yu24q/yu24q.pdf}, url = {https://proceedings.mlr.press/v235/yu24q.html}, abstract = {Sharpness-Aware Minimization (SAM), which performs gradient descent on adversarially perturbed weights, can improve generalization by identifying flatter minima. However, recent studies have shown that SAM may suffer from convergence instability and oscillate around saddle points, resulting in slow convergence and inferior performance. To address this problem, we propose the use of a lookahead mechanism to gather more information about the landscape by looking further ahead, and thus find a better trajectory to converge. By examining the nature of SAM, we simplify the extrapolation procedure, resulting in a more efficient algorithm. Theoretical results show that the proposed method converges to a stationary point and is less prone to saddle points. Experiments on standard benchmark datasets also verify that the proposed method outperforms the SOTAs, and converge more effectively to flat minima.} }
Endnote
%0 Conference Paper %T Improving Sharpness-Aware Minimization by Lookahead %A Runsheng Yu %A Youzhi Zhang %A James Kwok %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yu24q %I PMLR %P 57776--57802 %U https://proceedings.mlr.press/v235/yu24q.html %V 235 %X Sharpness-Aware Minimization (SAM), which performs gradient descent on adversarially perturbed weights, can improve generalization by identifying flatter minima. However, recent studies have shown that SAM may suffer from convergence instability and oscillate around saddle points, resulting in slow convergence and inferior performance. To address this problem, we propose the use of a lookahead mechanism to gather more information about the landscape by looking further ahead, and thus find a better trajectory to converge. By examining the nature of SAM, we simplify the extrapolation procedure, resulting in a more efficient algorithm. Theoretical results show that the proposed method converges to a stationary point and is less prone to saddle points. Experiments on standard benchmark datasets also verify that the proposed method outperforms the SOTAs, and converge more effectively to flat minima.
APA
Yu, R., Zhang, Y. & Kwok, J.. (2024). Improving Sharpness-Aware Minimization by Lookahead. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:57776-57802 Available from https://proceedings.mlr.press/v235/yu24q.html.

Related Material