Adaptive Sampling for SGD by Exploiting Side Information

Siddharth Gopal

Adaptive Sampling for SGD by Exploiting Side Information

Siddharth Gopal

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:364-372, 2016.

Abstract

This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-gopal16,
  title = 	 {Adaptive Sampling for SGD by Exploiting Side Information},
  author = 	 {Gopal, Siddharth},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {364--372},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/gopal16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/gopal16.html},
  abstract = 	 {This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.}
}

Endnote

%0 Conference Paper
%T Adaptive Sampling for SGD by Exploiting Side Information
%A Siddharth Gopal
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-gopal16
%I PMLR
%P 364--372
%U https://proceedings.mlr.press/v48/gopal16.html
%V 48
%X This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.

RIS


TY  - CPAPER
TI  - Adaptive Sampling for SGD by Exploiting Side Information
AU  - Siddharth Gopal
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-gopal16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 364
EP  - 372
L1  - http://proceedings.mlr.press/v48/gopal16.pdf
UR  - https://proceedings.mlr.press/v48/gopal16.html
AB  - This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.
ER  -

APA


Gopal, S.. (2016). Adaptive Sampling for SGD by Exploiting Side Information. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:364-372 Available from https://proceedings.mlr.press/v48/gopal16.html.

Related Material

Download PDF