Regression as Classification: Influence of Task Formulation on Neural Network Features

Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:11563-11582, 2023.

Abstract

Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-stewart23a, title = {Regression as Classification: Influence of Task Formulation on Neural Network Features}, author = {Stewart, Lawrence and Bach, Francis and Berthet, Quentin and Vert, Jean-Philippe}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {11563--11582}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/stewart23a/stewart23a.pdf}, url = {https://proceedings.mlr.press/v206/stewart23a.html}, abstract = {Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.} }
Endnote
%0 Conference Paper %T Regression as Classification: Influence of Task Formulation on Neural Network Features %A Lawrence Stewart %A Francis Bach %A Quentin Berthet %A Jean-Philippe Vert %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-stewart23a %I PMLR %P 11563--11582 %U https://proceedings.mlr.press/v206/stewart23a.html %V 206 %X Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.
APA
Stewart, L., Bach, F., Berthet, Q. & Vert, J.. (2023). Regression as Classification: Influence of Task Formulation on Neural Network Features. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:11563-11582 Available from https://proceedings.mlr.press/v206/stewart23a.html.

Related Material