Adaptive Sampling for SGD by Exploiting Side Information

Siddharth Gopal
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:364-372, 2016.

Abstract

This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-gopal16, title = {Adaptive Sampling for SGD by Exploiting Side Information}, author = {Gopal, Siddharth}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {364--372}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/gopal16.pdf}, url = {https://proceedings.mlr.press/v48/gopal16.html}, abstract = {This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.} }
Endnote
%0 Conference Paper %T Adaptive Sampling for SGD by Exploiting Side Information %A Siddharth Gopal %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-gopal16 %I PMLR %P 364--372 %U https://proceedings.mlr.press/v48/gopal16.html %V 48 %X This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.
RIS
TY - CPAPER TI - Adaptive Sampling for SGD by Exploiting Side Information AU - Siddharth Gopal BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-gopal16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 364 EP - 372 L1 - http://proceedings.mlr.press/v48/gopal16.pdf UR - https://proceedings.mlr.press/v48/gopal16.html AB - This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. ER -
APA
Gopal, S.. (2016). Adaptive Sampling for SGD by Exploiting Side Information. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:364-372 Available from https://proceedings.mlr.press/v48/gopal16.html.

Related Material