Coherent Inference on Optimal Play in Game Trees

Philipp Hennig; David Stern; Thore Graepel

Coherent Inference on Optimal Play in Game Trees

Philipp Hennig, David Stern, Thore Graepel

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:326-333, 2010.

Abstract

Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-hennig10a,
  title = 	 {Coherent Inference on Optimal Play in Game Trees},
  author = 	 {Hennig, Philipp and Stern, David and Graepel, Thore},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {326--333},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/hennig10a.html},
  abstract = 	 {Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.}
}

Endnote

%0 Conference Paper
%T Coherent Inference on Optimal Play in Game Trees
%A Philipp Hennig
%A David Stern
%A Thore Graepel
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-hennig10a
%I PMLR
%P 326--333
%U https://proceedings.mlr.press/v9/hennig10a.html
%V 9
%X Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

RIS


TY  - CPAPER
TI  - Coherent Inference on Optimal Play in Game Trees
AU  - Philipp Hennig
AU  - David Stern
AU  - Thore Graepel
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-hennig10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 326
EP  - 333
L1  - http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf
UR  - https://proceedings.mlr.press/v9/hennig10a.html
AB  - Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
ER  -

APA


Hennig, P., Stern, D. & Graepel, T.. (2010). Coherent Inference on Optimal Play in Game Trees. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:326-333 Available from https://proceedings.mlr.press/v9/hennig10a.html.

Related Material

Download PDF