Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John Patrick Cunningham
Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, PMLR 137:1-10, 2020.

Abstract

Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.

Cite this Paper


BibTeX
@InProceedings{pmlr-v137-gordon-rodriguez20a, title = {Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning}, author = {Gordon-Rodriguez, Elliott and Loaiza-Ganem, Gabriel and Pleiss, Geoff and Cunningham, John Patrick}, booktitle = {Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops}, pages = {1--10}, year = {2020}, editor = {Zosa Forde, Jessica and Ruiz, Francisco and Pradier, Melanie F. and Schein, Aaron}, volume = {137}, series = {Proceedings of Machine Learning Research}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v137/gordon-rodriguez20a/gordon-rodriguez20a.pdf}, url = {https://proceedings.mlr.press/v137/gordon-rodriguez20a.html}, abstract = {Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.} }
Endnote
%0 Conference Paper %T Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning %A Elliott Gordon-Rodriguez %A Gabriel Loaiza-Ganem %A Geoff Pleiss %A John Patrick Cunningham %B Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops %C Proceedings of Machine Learning Research %D 2020 %E Jessica Zosa Forde %E Francisco Ruiz %E Melanie F. Pradier %E Aaron Schein %F pmlr-v137-gordon-rodriguez20a %I PMLR %P 1--10 %U https://proceedings.mlr.press/v137/gordon-rodriguez20a.html %V 137 %X Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.
APA
Gordon-Rodriguez, E., Loaiza-Ganem, G., Pleiss, G. & Cunningham, J.P.. (2020). Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning. Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, in Proceedings of Machine Learning Research 137:1-10 Available from https://proceedings.mlr.press/v137/gordon-rodriguez20a.html.

Related Material