Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos, Paul Grigas, Zheng Wen
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7080-7089, 2021.

Abstract

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-lobos21a, title = {Joint Online Learning and Decision-making via Dual Mirror Descent}, author = {Lobos, Alfonso and Grigas, Paul and Wen, Zheng}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7080--7089}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/lobos21a/lobos21a.pdf}, url = {https://proceedings.mlr.press/v139/lobos21a.html}, abstract = {We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.} }
Endnote
%0 Conference Paper %T Joint Online Learning and Decision-making via Dual Mirror Descent %A Alfonso Lobos %A Paul Grigas %A Zheng Wen %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-lobos21a %I PMLR %P 7080--7089 %U https://proceedings.mlr.press/v139/lobos21a.html %V 139 %X We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.
APA
Lobos, A., Grigas, P. & Wen, Z.. (2021). Joint Online Learning and Decision-making via Dual Mirror Descent. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7080-7089 Available from https://proceedings.mlr.press/v139/lobos21a.html.

Related Material