A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification

Qingya Wang, Feng Liu, Bin Luo
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1449-1462, 2024.

Abstract

Balancing exploration and exploitation has long been recognized as an important theme in the online planning algorithms for POMDP problems. Explorative actions on one hand prevent the planning from falling into the suboptimal dilemma, while hindering the convergence of the planning procedure on the other hand. Therefore, it is meaningful to maintain the exploration as well as taking a step forward towards exploitation. Note that there is a deviation between the action selection criteria in the planning procedure and in the execution procedure, which inspires us to build a bridge between these two criteria to accelerate the convergence. A Partially Observable Monte Carlo Planning algorithm based on Path Modification (POMCP-PM) is presented in the paper, which modifies the backtracing paths by considering the two criteria simultaneously when updating the values of parent nodes. The algorithm is general as the Upper Confidence Bound Apply to Tree (UCT) algorithm used to select actions can be easily replaced by other criteria. Experimental results demonstrate that POMCP-PM outperforms POMCP with varying numbers of simulations on several scenarios with different scales.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-wang24c, title = {A Partially Observable {M}onte {C}arlo Planning Algorithm Based on Path Modification}, author = {Wang, Qingya and Liu, Feng and Luo, Bin}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {1449--1462}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/wang24c/wang24c.pdf}, url = {https://proceedings.mlr.press/v222/wang24c.html}, abstract = {Balancing exploration and exploitation has long been recognized as an important theme in the online planning algorithms for POMDP problems. Explorative actions on one hand prevent the planning from falling into the suboptimal dilemma, while hindering the convergence of the planning procedure on the other hand. Therefore, it is meaningful to maintain the exploration as well as taking a step forward towards exploitation. Note that there is a deviation between the action selection criteria in the planning procedure and in the execution procedure, which inspires us to build a bridge between these two criteria to accelerate the convergence. A Partially Observable Monte Carlo Planning algorithm based on Path Modification (POMCP-PM) is presented in the paper, which modifies the backtracing paths by considering the two criteria simultaneously when updating the values of parent nodes. The algorithm is general as the Upper Confidence Bound Apply to Tree (UCT) algorithm used to select actions can be easily replaced by other criteria. Experimental results demonstrate that POMCP-PM outperforms POMCP with varying numbers of simulations on several scenarios with different scales.} }
Endnote
%0 Conference Paper %T A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification %A Qingya Wang %A Feng Liu %A Bin Luo %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-wang24c %I PMLR %P 1449--1462 %U https://proceedings.mlr.press/v222/wang24c.html %V 222 %X Balancing exploration and exploitation has long been recognized as an important theme in the online planning algorithms for POMDP problems. Explorative actions on one hand prevent the planning from falling into the suboptimal dilemma, while hindering the convergence of the planning procedure on the other hand. Therefore, it is meaningful to maintain the exploration as well as taking a step forward towards exploitation. Note that there is a deviation between the action selection criteria in the planning procedure and in the execution procedure, which inspires us to build a bridge between these two criteria to accelerate the convergence. A Partially Observable Monte Carlo Planning algorithm based on Path Modification (POMCP-PM) is presented in the paper, which modifies the backtracing paths by considering the two criteria simultaneously when updating the values of parent nodes. The algorithm is general as the Upper Confidence Bound Apply to Tree (UCT) algorithm used to select actions can be easily replaced by other criteria. Experimental results demonstrate that POMCP-PM outperforms POMCP with varying numbers of simulations on several scenarios with different scales.
APA
Wang, Q., Liu, F. & Luo, B.. (2024). A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:1449-1462 Available from https://proceedings.mlr.press/v222/wang24c.html.

Related Material