Learning Policies for Contextual Submodular Prediction

Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, Drew Bagnell
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1364-1372, 2013.

Abstract

Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-ross13b, title = {Learning Policies for Contextual Submodular Prediction}, author = {Ross, Stephane and Zhou, Jiaji and Yue, Yisong and Dey, Debadeepta and Bagnell, Drew}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1364--1372}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/ross13b.pdf}, url = {https://proceedings.mlr.press/v28/ross13b.html}, abstract = {Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.} }
Endnote
%0 Conference Paper %T Learning Policies for Contextual Submodular Prediction %A Stephane Ross %A Jiaji Zhou %A Yisong Yue %A Debadeepta Dey %A Drew Bagnell %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-ross13b %I PMLR %P 1364--1372 %U https://proceedings.mlr.press/v28/ross13b.html %V 28 %N 3 %X Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.
RIS
TY - CPAPER TI - Learning Policies for Contextual Submodular Prediction AU - Stephane Ross AU - Jiaji Zhou AU - Yisong Yue AU - Debadeepta Dey AU - Drew Bagnell BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-ross13b PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 1364 EP - 1372 L1 - http://proceedings.mlr.press/v28/ross13b.pdf UR - https://proceedings.mlr.press/v28/ross13b.html AB - Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization. ER -
APA
Ross, S., Zhou, J., Yue, Y., Dey, D. & Bagnell, D.. (2013). Learning Policies for Contextual Submodular Prediction. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1364-1372 Available from https://proceedings.mlr.press/v28/ross13b.html.

Related Material