Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints

Martino Bernasconi; Federico Cacciamani; Matteo Castiglioni; Alberto Marchesi; Nicola Gatti; Francesco Trovò

Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints

Martino Bernasconi, Federico Cacciamani, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1854-1873, 2022.

Abstract

We study decision making problems in which an agent sequentially interacts with a stochastic environment defined by means of a tree structure. The agent repeatedly faces the environment over time, and, after each round, it perceives a utility and a cost, which are both stochastic. The goal of the agent is to learn an optimal strategy in an online fashion, while, at the same time, keeping costs below a given safety threshold. Our model naturally fits many real-world scenarios, such as, e.g., opponent exploitation in games and web link selection. We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. First, we show that, in general, any algorithm with such a guarantee incurs in a linear regret. This motivates the introduction of a relaxed problem, namely the soft-threshold problem, in which we only require that the cumulative violation of the threshold constraint grows sublinearly, and, thus, we can provide an algorithm with sublinear regret. Next, we show how, in the hard-threshold problem, a sublinear regret algorithm can be designed under the additional assumption that there exists a known strategy strictly satisfying the threshold constraint. We also show that our regret bounds are tight. Finally, we cast the opponent exploitation problem to our model, and we experimentally evaluate our algorithms on a standard testbed of games.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-bernasconi22a,
  title = 	 {Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints},
  author =       {Bernasconi, Martino and Cacciamani, Federico and Castiglioni, Matteo and Marchesi, Alberto and Gatti, Nicola and Trov{\`o}, Francesco},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {1854--1873},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/bernasconi22a/bernasconi22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/bernasconi22a.html},
  abstract = 	 {We study decision making problems in which an agent sequentially interacts with a stochastic environment defined by means of a tree structure. The agent repeatedly faces the environment over time, and, after each round, it perceives a utility and a cost, which are both stochastic. The goal of the agent is to learn an optimal strategy in an online fashion, while, at the same time, keeping costs below a given safety threshold. Our model naturally fits many real-world scenarios, such as, e.g., opponent exploitation in games and web link selection. We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. First, we show that, in general, any algorithm with such a guarantee incurs in a linear regret. This motivates the introduction of a relaxed problem, namely the soft-threshold problem, in which we only require that the cumulative violation of the threshold constraint grows sublinearly, and, thus, we can provide an algorithm with sublinear regret. Next, we show how, in the hard-threshold problem, a sublinear regret algorithm can be designed under the additional assumption that there exists a known strategy strictly satisfying the threshold constraint. We also show that our regret bounds are tight. Finally, we cast the opponent exploitation problem to our model, and we experimentally evaluate our algorithms on a standard testbed of games.}
}

Endnote

%0 Conference Paper
%T Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints
%A Martino Bernasconi
%A Federico Cacciamani
%A Matteo Castiglioni
%A Alberto Marchesi
%A Nicola Gatti
%A Francesco Trovò
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-bernasconi22a
%I PMLR
%P 1854--1873
%U https://proceedings.mlr.press/v162/bernasconi22a.html
%V 162
%X We study decision making problems in which an agent sequentially interacts with a stochastic environment defined by means of a tree structure. The agent repeatedly faces the environment over time, and, after each round, it perceives a utility and a cost, which are both stochastic. The goal of the agent is to learn an optimal strategy in an online fashion, while, at the same time, keeping costs below a given safety threshold. Our model naturally fits many real-world scenarios, such as, e.g., opponent exploitation in games and web link selection. We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. First, we show that, in general, any algorithm with such a guarantee incurs in a linear regret. This motivates the introduction of a relaxed problem, namely the soft-threshold problem, in which we only require that the cumulative violation of the threshold constraint grows sublinearly, and, thus, we can provide an algorithm with sublinear regret. Next, we show how, in the hard-threshold problem, a sublinear regret algorithm can be designed under the additional assumption that there exists a known strategy strictly satisfying the threshold constraint. We also show that our regret bounds are tight. Finally, we cast the opponent exploitation problem to our model, and we experimentally evaluate our algorithms on a standard testbed of games.

APA


Bernasconi, M., Cacciamani, F., Castiglioni, M., Marchesi, A., Gatti, N. & Trovò, F.. (2022). Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:1854-1873 Available from https://proceedings.mlr.press/v162/bernasconi22a.html.

Related Material

Download PDF