Learning to Search Better than Your Teacher

Kai-Wei Chang; Akshay Krishnamurthy; Alekh Agarwal; Hal Daumé, III; John Langford

Learning to Search Better than Your Teacher

Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2058-2066, 2015.

Abstract

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-changb15,
  title = 	 {Learning to Search Better than Your Teacher},
  author =       {Chang, Kai-Wei and Krishnamurthy, Akshay and Daum\'e, III, Hal and Langford, John},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2058--2066},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/changb15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/changb15.html},
  abstract = 	 {Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.}
}

Endnote

%0 Conference Paper
%T Learning to Search Better than Your Teacher
%A Kai-Wei Chang
%A Akshay Krishnamurthy
%A Alekh Agarwal
%A Hal Daumé, III
%A John Langford
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-changb15
%I PMLR
%P 2058--2066
%U https://proceedings.mlr.press/v37/changb15.html
%V 37
%X Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

RIS


TY  - CPAPER
TI  - Learning to Search Better than Your Teacher
AU  - Kai-Wei Chang
AU  - Akshay Krishnamurthy
AU  - Alekh Agarwal
AU  - Hal Daumé, III
AU  - John Langford
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-changb15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2058
EP  - 2066
L1  - http://proceedings.mlr.press/v37/changb15.pdf
UR  - https://proceedings.mlr.press/v37/changb15.html
AB  - Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.
ER  -

APA


Chang, K., Krishnamurthy, A., Agarwal, A., Daumé, III, H. & Langford, J.. (2015). Learning to Search Better than Your Teacher. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2058-2066 Available from https://proceedings.mlr.press/v37/changb15.html.

Learning to Search Better than Your Teacher

Abstract

Cite this Paper

Related Material