Efficient Algorithms for Adversarial Contextual Learning

Vasilis Syrgkanis; Akshay Krishnamurthy; Robert Schapire

Efficient Algorithms for Adversarial Contextual Learning

Vasilis Syrgkanis, Akshay Krishnamurthy, Robert Schapire

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2159-2168, 2016.

Abstract

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai and Vempala, 2005) and achieve regret O(T^3/4\sqrtK\log(N)) in the transductive setting and O(T^2/3 d^3/4 K\sqrt\log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-syrgkanis16,
  title = 	 {Efficient Algorithms for Adversarial Contextual Learning},
  author = 	 {Syrgkanis, Vasilis and Krishnamurthy, Akshay and Schapire, Robert},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2159--2168},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/syrgkanis16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/syrgkanis16.html},
  abstract = 	 {We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai and Vempala, 2005) and achieve regret O(T^3/4\sqrtK\log(N)) in the transductive setting and O(T^2/3 d^3/4 K\sqrt\log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.}
}

Endnote

%0 Conference Paper
%T Efficient Algorithms for Adversarial Contextual Learning
%A Vasilis Syrgkanis
%A Akshay Krishnamurthy
%A Robert Schapire
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-syrgkanis16
%I PMLR
%P 2159--2168
%U https://proceedings.mlr.press/v48/syrgkanis16.html
%V 48
%X We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai and Vempala, 2005) and achieve regret O(T^3/4\sqrtK\log(N)) in the transductive setting and O(T^2/3 d^3/4 K\sqrt\log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.

RIS


TY  - CPAPER
TI  - Efficient Algorithms for Adversarial Contextual Learning
AU  - Vasilis Syrgkanis
AU  - Akshay Krishnamurthy
AU  - Robert Schapire
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-syrgkanis16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2159
EP  - 2168
L1  - http://proceedings.mlr.press/v48/syrgkanis16.pdf
UR  - https://proceedings.mlr.press/v48/syrgkanis16.html
AB  - We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai and Vempala, 2005) and achieve regret O(T^3/4\sqrtK\log(N)) in the transductive setting and O(T^2/3 d^3/4 K\sqrt\log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.
ER  -

APA


Syrgkanis, V., Krishnamurthy, A. & Schapire, R.. (2016). Efficient Algorithms for Adversarial Contextual Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2159-2168 Available from https://proceedings.mlr.press/v48/syrgkanis16.html.

Efficient Algorithms for Adversarial Contextual Learning

Abstract

Cite this Paper

Related Material