Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

Geonhui Yoo, Minhak Song, Chulhee Yun
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:72574-72617, 2025.

Abstract

When training deep neural networks with gradient descent, sharpness often increases—a phenomenon known as progressive sharpening—before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yoo25b, title = {Understanding Sharpness Dynamics in {NN} Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More}, author = {Yoo, Geonhui and Song, Minhak and Yun, Chulhee}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {72574--72617}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yoo25b/yoo25b.pdf}, url = {https://proceedings.mlr.press/v267/yoo25b.html}, abstract = {When training deep neural networks with gradient descent, sharpness often increases—a phenomenon known as progressive sharpening—before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.} }
Endnote
%0 Conference Paper %T Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More %A Geonhui Yoo %A Minhak Song %A Chulhee Yun %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yoo25b %I PMLR %P 72574--72617 %U https://proceedings.mlr.press/v267/yoo25b.html %V 267 %X When training deep neural networks with gradient descent, sharpness often increases—a phenomenon known as progressive sharpening—before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.
APA
Yoo, G., Song, M. & Yun, C.. (2025). Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:72574-72617 Available from https://proceedings.mlr.press/v267/yoo25b.html.

Related Material