A Light Touch for Heavily Constrained SGD

Andrew Cotter, Maya Gupta, Jan Pfeifer
; 29th Annual Conference on Learning Theory, PMLR 49:729-771, 2016.

Abstract

Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-cotter16, title = {A Light Touch for Heavily Constrained SGD}, author = {Andrew Cotter and Maya Gupta and Jan Pfeifer}, booktitle = {29th Annual Conference on Learning Theory}, pages = {729--771}, year = {2016}, editor = {Vitaly Feldman and Alexander Rakhlin and Ohad Shamir}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/cotter16.pdf}, url = {http://proceedings.mlr.press/v49/cotter16.html}, abstract = {Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.} }
Endnote
%0 Conference Paper %T A Light Touch for Heavily Constrained SGD %A Andrew Cotter %A Maya Gupta %A Jan Pfeifer %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-cotter16 %I PMLR %J Proceedings of Machine Learning Research %P 729--771 %U http://proceedings.mlr.press %V 49 %W PMLR %X Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.
RIS
TY - CPAPER TI - A Light Touch for Heavily Constrained SGD AU - Andrew Cotter AU - Maya Gupta AU - Jan Pfeifer BT - 29th Annual Conference on Learning Theory PY - 2016/06/06 DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-cotter16 PB - PMLR SP - 729 DP - PMLR EP - 771 L1 - http://proceedings.mlr.press/v49/cotter16.pdf UR - http://proceedings.mlr.press/v49/cotter16.html AB - Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints. ER -
APA
Cotter, A., Gupta, M. & Pfeifer, J.. (2016). A Light Touch for Heavily Constrained SGD. 29th Annual Conference on Learning Theory, in PMLR 49:729-771

Related Material