Rank-One Modified Value Iteration

Arman Sharifi Kolarijani, Tolga Ok, Peyman Mohajerin Esfahani, Mohamad Amin Sharifi Kolarijani
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:31182-31201, 2025.

Abstract

In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kolarijani25a, title = {Rank-One Modified Value Iteration}, author = {Kolarijani, Arman Sharifi and Ok, Tolga and Mohajerin Esfahani, Peyman and Sharifi Kolarijani, Mohamad Amin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {31182--31201}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kolarijani25a/kolarijani25a.pdf}, url = {https://proceedings.mlr.press/v267/kolarijani25a.html}, abstract = {In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.} }
Endnote
%0 Conference Paper %T Rank-One Modified Value Iteration %A Arman Sharifi Kolarijani %A Tolga Ok %A Peyman Mohajerin Esfahani %A Mohamad Amin Sharifi Kolarijani %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kolarijani25a %I PMLR %P 31182--31201 %U https://proceedings.mlr.press/v267/kolarijani25a.html %V 267 %X In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
APA
Kolarijani, A.S., Ok, T., Mohajerin Esfahani, P. & Sharifi Kolarijani, M.A.. (2025). Rank-One Modified Value Iteration. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:31182-31201 Available from https://proceedings.mlr.press/v267/kolarijani25a.html.

Related Material