Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Toshinori Kitamura; Tadashi Kozuno; Yunhao Tang; Nino Vieillard; Michal Valko; Wenhao Yang; Jincheng Mei; Pierre Menard; Mohammad Gheshlaghi Azar; Remi Munos; Olivier Pietquin; Matthieu Geist; Csaba Szepesvari; Wataru Kumagai; Yutaka Matsuo

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Menard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17135-17175, 2023.

Abstract

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an

$\varepsilon$ -optimal policy with probability

$1-\delta$ under the settings of an infinite-horizon linear MDP, generative model, and G-optimal design. We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. Furthermore, we propose a practical VWLS algorithm for value-based deep RL, Deep Variance Weighting (DVW). Our experiments demonstrate that DVW improves the performance of popular value-based deep RL algorithms on a set of MinAtar benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-kitamura23a,
  title = 	 {Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear {MDP}s: Theory and Practice},
  author =       {Kitamura, Toshinori and Kozuno, Tadashi and Tang, Yunhao and Vieillard, Nino and Valko, Michal and Yang, Wenhao and Mei, Jincheng and Menard, Pierre and Gheshlaghi Azar, Mohammad and Munos, Remi and Pietquin, Olivier and Geist, Matthieu and Szepesvari, Csaba and Kumagai, Wataru and Matsuo, Yutaka},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {17135--17175},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/kitamura23a/kitamura23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/kitamura23a.html},
  abstract = 	 {Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an $\varepsilon$-optimal policy with probability $1-\delta$ under the settings of an infinite-horizon linear MDP, generative model, and G-optimal design. We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. Furthermore, we propose a practical VWLS algorithm for value-based deep RL, Deep Variance Weighting (DVW). Our experiments demonstrate that DVW improves the performance of popular value-based deep RL algorithms on a set of MinAtar benchmarks.}
}

Endnote

%0 Conference Paper
%T Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
%A Toshinori Kitamura
%A Tadashi Kozuno
%A Yunhao Tang
%A Nino Vieillard
%A Michal Valko
%A Wenhao Yang
%A Jincheng Mei
%A Pierre Menard
%A Mohammad Gheshlaghi Azar
%A Remi Munos
%A Olivier Pietquin
%A Matthieu Geist
%A Csaba Szepesvari
%A Wataru Kumagai
%A Yutaka Matsuo
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-kitamura23a
%I PMLR
%P 17135--17175
%U https://proceedings.mlr.press/v202/kitamura23a.html
%V 202
%X Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an $\varepsilon$-optimal policy with probability $1-\delta$ under the settings of an infinite-horizon linear MDP, generative model, and G-optimal design. We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. Furthermore, we propose a practical VWLS algorithm for value-based deep RL, Deep Variance Weighting (DVW). Our experiments demonstrate that DVW improves the performance of popular value-based deep RL algorithms on a set of MinAtar benchmarks.

APA


Kitamura, T., Kozuno, T., Tang, Y., Vieillard, N., Valko, M., Yang, W., Mei, J., Menard, P., Gheshlaghi Azar, M., Munos, R., Pietquin, O., Geist, M., Szepesvari, C., Kumagai, W. & Matsuo, Y.. (2023). Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17135-17175 Available from https://proceedings.mlr.press/v202/kitamura23a.html.

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Abstract

Cite this Paper

Related Material