Improving Regression Performance with Distributional Losses

Ehsan Imani, Martha White
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2157-2166, 2018.

Abstract

There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels—such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we investigate the reasons for this improvement, in a regression setting. We introduce a novel distributional regression loss, and similarly find it significantly improves prediction accuracy. We investigate several common hypotheses, around reducing overfitting and improved representations. We instead find evidence for an alternative hypothesis: this loss is easier to optimize, with better behaved gradients, resulting in improved generalization. We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-imani18a, title = {Improving Regression Performance with Distributional Losses}, author = {Imani, Ehsan and White, Martha}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2157--2166}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/imani18a/imani18a.pdf}, url = {https://proceedings.mlr.press/v80/imani18a.html}, abstract = {There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels—such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we investigate the reasons for this improvement, in a regression setting. We introduce a novel distributional regression loss, and similarly find it significantly improves prediction accuracy. We investigate several common hypotheses, around reducing overfitting and improved representations. We instead find evidence for an alternative hypothesis: this loss is easier to optimize, with better behaved gradients, resulting in improved generalization. We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.} }
Endnote
%0 Conference Paper %T Improving Regression Performance with Distributional Losses %A Ehsan Imani %A Martha White %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-imani18a %I PMLR %P 2157--2166 %U https://proceedings.mlr.press/v80/imani18a.html %V 80 %X There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels—such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we investigate the reasons for this improvement, in a regression setting. We introduce a novel distributional regression loss, and similarly find it significantly improves prediction accuracy. We investigate several common hypotheses, around reducing overfitting and improved representations. We instead find evidence for an alternative hypothesis: this loss is easier to optimize, with better behaved gradients, resulting in improved generalization. We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.
APA
Imani, E. & White, M.. (2018). Improving Regression Performance with Distributional Losses. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2157-2166 Available from https://proceedings.mlr.press/v80/imani18a.html.

Related Material