Static and Dynamic Values of Computation in MCTS

Eren Sezener, Peter Dayan
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:31-40, 2020.

Abstract

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph{myopic}limitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with thestate-of-the-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-sezener20a, title = {Static and Dynamic Values of Computation in MCTS}, author = {Sezener, Eren and Dayan, Peter}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {31--40}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/sezener20a/sezener20a.pdf}, url = {https://proceedings.mlr.press/v124/sezener20a.html}, abstract = {Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph{myopic}limitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with thestate-of-the-art.} }
Endnote
%0 Conference Paper %T Static and Dynamic Values of Computation in MCTS %A Eren Sezener %A Peter Dayan %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-sezener20a %I PMLR %P 31--40 %U https://proceedings.mlr.press/v124/sezener20a.html %V 124 %X Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph{myopic}limitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with thestate-of-the-art.
APA
Sezener, E. & Dayan, P.. (2020). Static and Dynamic Values of Computation in MCTS. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:31-40 Available from https://proceedings.mlr.press/v124/sezener20a.html.

Related Material