Adaptive Sampling Probabilities for Non-Smooth Optimization

Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, John C. Duchi
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2574-2583, 2017.

Abstract

Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data. When gradients in certain blocks of features (for coordinate descent) or examples (for SGD) are larger than others, there is a natural structure that can be exploited for quicker convergence. Yet adaptive variants often suffer nontrivial computational overhead. We present a framework that discovers and leverages such structural properties at a low computational cost. We employ a bandit optimization procedure that “learns” probabilities for sampling coordinates or examples in (non-smooth) optimization problems, allowing us to guarantee performance close to that of the optimal stationary sampling distribution. When such structures exist, our algorithms achieve tighter convergence guarantees than their non-adaptive counterparts, and we complement our analysis with experiments on several datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-namkoong17a, title = {Adaptive Sampling Probabilities for Non-Smooth Optimization}, author = {Hongseok Namkoong and Aman Sinha and Steve Yadlowsky and John C. Duchi}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2574--2583}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/namkoong17a/namkoong17a.pdf}, url = {https://proceedings.mlr.press/v70/namkoong17a.html}, abstract = {Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data. When gradients in certain blocks of features (for coordinate descent) or examples (for SGD) are larger than others, there is a natural structure that can be exploited for quicker convergence. Yet adaptive variants often suffer nontrivial computational overhead. We present a framework that discovers and leverages such structural properties at a low computational cost. We employ a bandit optimization procedure that “learns” probabilities for sampling coordinates or examples in (non-smooth) optimization problems, allowing us to guarantee performance close to that of the optimal stationary sampling distribution. When such structures exist, our algorithms achieve tighter convergence guarantees than their non-adaptive counterparts, and we complement our analysis with experiments on several datasets.} }
Endnote
%0 Conference Paper %T Adaptive Sampling Probabilities for Non-Smooth Optimization %A Hongseok Namkoong %A Aman Sinha %A Steve Yadlowsky %A John C. Duchi %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-namkoong17a %I PMLR %P 2574--2583 %U https://proceedings.mlr.press/v70/namkoong17a.html %V 70 %X Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data. When gradients in certain blocks of features (for coordinate descent) or examples (for SGD) are larger than others, there is a natural structure that can be exploited for quicker convergence. Yet adaptive variants often suffer nontrivial computational overhead. We present a framework that discovers and leverages such structural properties at a low computational cost. We employ a bandit optimization procedure that “learns” probabilities for sampling coordinates or examples in (non-smooth) optimization problems, allowing us to guarantee performance close to that of the optimal stationary sampling distribution. When such structures exist, our algorithms achieve tighter convergence guarantees than their non-adaptive counterparts, and we complement our analysis with experiments on several datasets.
APA
Namkoong, H., Sinha, A., Yadlowsky, S. & Duchi, J.C.. (2017). Adaptive Sampling Probabilities for Non-Smooth Optimization. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2574-2583 Available from https://proceedings.mlr.press/v70/namkoong17a.html.

Related Material