Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

Ankit Vani, Frederick Tung, Gabriel L. Oliveira, Hossein Sharifi-Noghabi
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49122-49136, 2024.

Abstract

Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM’s training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model’s outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-10,100, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-vani24a, title = {Forget Sharpness: Perturbed Forgetting of Model Biases Within {SAM} Dynamics}, author = {Vani, Ankit and Tung, Frederick and Oliveira, Gabriel L. and Sharifi-Noghabi, Hossein}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {49122--49136}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/vani24a/vani24a.pdf}, url = {https://proceedings.mlr.press/v235/vani24a.html}, abstract = {Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM’s training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model’s outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-10,100, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.} }
Endnote
%0 Conference Paper %T Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics %A Ankit Vani %A Frederick Tung %A Gabriel L. Oliveira %A Hossein Sharifi-Noghabi %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-vani24a %I PMLR %P 49122--49136 %U https://proceedings.mlr.press/v235/vani24a.html %V 235 %X Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM’s training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model’s outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-10,100, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.
APA
Vani, A., Tung, F., Oliveira, G.L. & Sharifi-Noghabi, H.. (2024). Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:49122-49136 Available from https://proceedings.mlr.press/v235/vani24a.html.

Related Material