Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos; Paul Grigas; Zheng Wen

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos, Paul Grigas, Zheng Wen

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7080-7089, 2021.

Abstract

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-lobos21a,
  title = 	 {Joint Online Learning and Decision-making via Dual Mirror Descent},
  author =       {Lobos, Alfonso and Grigas, Paul and Wen, Zheng},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {7080--7089},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/lobos21a/lobos21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/lobos21a.html},
  abstract = 	 {We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.}
}

Endnote

%0 Conference Paper
%T Joint Online Learning and Decision-making via Dual Mirror Descent
%A Alfonso Lobos
%A Paul Grigas
%A Zheng Wen
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-lobos21a
%I PMLR
%P 7080--7089
%U https://proceedings.mlr.press/v139/lobos21a.html
%V 139
%X We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.

APA

Lobos, A., Grigas, P. & Wen, Z.. (2021). Joint Online Learning and Decision-making via Dual Mirror Descent. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7080-7089 Available from https://proceedings.mlr.press/v139/lobos21a.html.

Joint Online Learning and Decision-making via Dual Mirror Descent

Abstract

Cite this Paper

Related Material