The Tree Loss: Improving Generalization with Many Classes

Yujie Wang, Mike Izbicki
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:6121-6133, 2022.

Abstract

Multi-class classification problems often have many semantically similar classes. For example, 90 of ImageNet’s 1000 classes are for different breeds of dog. We should expect that these semantically similar classes will have similar parameter vectors, but the standard cross entropy loss does not enforce this constraint. We introduce the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter).

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-wang22d, title = { The Tree Loss: Improving Generalization with Many Classes }, author = {Wang, Yujie and Izbicki, Mike}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {6121--6133}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/wang22d/wang22d.pdf}, url = {https://proceedings.mlr.press/v151/wang22d.html}, abstract = { Multi-class classification problems often have many semantically similar classes. For example, 90 of ImageNet’s 1000 classes are for different breeds of dog. We should expect that these semantically similar classes will have similar parameter vectors, but the standard cross entropy loss does not enforce this constraint. We introduce the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter). } }
Endnote
%0 Conference Paper %T The Tree Loss: Improving Generalization with Many Classes %A Yujie Wang %A Mike Izbicki %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-wang22d %I PMLR %P 6121--6133 %U https://proceedings.mlr.press/v151/wang22d.html %V 151 %X Multi-class classification problems often have many semantically similar classes. For example, 90 of ImageNet’s 1000 classes are for different breeds of dog. We should expect that these semantically similar classes will have similar parameter vectors, but the standard cross entropy loss does not enforce this constraint. We introduce the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter).
APA
Wang, Y. & Izbicki, M.. (2022). The Tree Loss: Improving Generalization with Many Classes . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:6121-6133 Available from https://proceedings.mlr.press/v151/wang22d.html.

Related Material