Tilted Sharpness-Aware Minimization

Tian Li, Tianyi Zhou, Jeff Bilmes
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:36886-36905, 2025.

Abstract

Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a smoothed generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that incur larger losses. TSAM is parameterized by a tilt hyperparameter $t$ and reduces to SAM as $t$ approaches infinity. We show that TSAM is smoother than SAM and thus easier to optimize, and it explicitly favors flatter minima. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-li25dx, title = {Tilted Sharpness-Aware Minimization}, author = {Li, Tian and Zhou, Tianyi and Bilmes, Jeff}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {36886--36905}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25dx/li25dx.pdf}, url = {https://proceedings.mlr.press/v267/li25dx.html}, abstract = {Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a smoothed generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that incur larger losses. TSAM is parameterized by a tilt hyperparameter $t$ and reduces to SAM as $t$ approaches infinity. We show that TSAM is smoother than SAM and thus easier to optimize, and it explicitly favors flatter minima. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.} }
Endnote
%0 Conference Paper %T Tilted Sharpness-Aware Minimization %A Tian Li %A Tianyi Zhou %A Jeff Bilmes %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-li25dx %I PMLR %P 36886--36905 %U https://proceedings.mlr.press/v267/li25dx.html %V 267 %X Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a smoothed generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that incur larger losses. TSAM is parameterized by a tilt hyperparameter $t$ and reduces to SAM as $t$ approaches infinity. We show that TSAM is smoother than SAM and thus easier to optimize, and it explicitly favors flatter minima. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.
APA
Li, T., Zhou, T. & Bilmes, J.. (2025). Tilted Sharpness-Aware Minimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:36886-36905 Available from https://proceedings.mlr.press/v267/li25dx.html.

Related Material