Loss Decomposition for Fast Learning in Large Output Spaces

Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5640-5649, 2018.

Abstract

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-yen18a, title = {Loss Decomposition for Fast Learning in Large Output Spaces}, author = {Yen, Ian En-Hsu and Kale, Satyen and Yu, Felix and Holtmann-Rice, Daniel and Kumar, Sanjiv and Ravikumar, Pradeep}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5640--5649}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/yen18a/yen18a.pdf}, url = {http://proceedings.mlr.press/v80/yen18a.html}, abstract = {For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.} }
Endnote
%0 Conference Paper %T Loss Decomposition for Fast Learning in Large Output Spaces %A Ian En-Hsu Yen %A Satyen Kale %A Felix Yu %A Daniel Holtmann-Rice %A Sanjiv Kumar %A Pradeep Ravikumar %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-yen18a %I PMLR %P 5640--5649 %U http://proceedings.mlr.press/v80/yen18a.html %V 80 %X For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.
APA
Yen, I.E., Kale, S., Yu, F., Holtmann-Rice, D., Kumar, S. & Ravikumar, P.. (2018). Loss Decomposition for Fast Learning in Large Output Spaces. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5640-5649 Available from http://proceedings.mlr.press/v80/yen18a.html.

Related Material