Locally Adaptive Label Smoothing Improves Predictive Churn

Dara Bahri, Heinrich Jiang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:532-542, 2021.

Abstract

Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn}– disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches– even when the trained models all attain similar accuracies. Such prediction churn can be very undesirable in practice. In this paper, we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example’s label based on the example’s neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-bahri21a, title = {Locally Adaptive Label Smoothing Improves Predictive Churn}, author = {Bahri, Dara and Jiang, Heinrich}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {532--542}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/bahri21a/bahri21a.pdf}, url = {https://proceedings.mlr.press/v139/bahri21a.html}, abstract = {Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn}– disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches– even when the trained models all attain similar accuracies. Such prediction churn can be very undesirable in practice. In this paper, we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example’s label based on the example’s neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures.} }
Endnote
%0 Conference Paper %T Locally Adaptive Label Smoothing Improves Predictive Churn %A Dara Bahri %A Heinrich Jiang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-bahri21a %I PMLR %P 532--542 %U https://proceedings.mlr.press/v139/bahri21a.html %V 139 %X Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn}– disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches– even when the trained models all attain similar accuracies. Such prediction churn can be very undesirable in practice. In this paper, we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example’s label based on the example’s neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures.
APA
Bahri, D. & Jiang, H.. (2021). Locally Adaptive Label Smoothing Improves Predictive Churn. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:532-542 Available from https://proceedings.mlr.press/v139/bahri21a.html.

Related Material