Prospective Side Information for Latent MDPs

Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:25755-25783, 2024.

Abstract

In many interactive decision-making problems, there is contextual side information that remains fixed within the course of an interaction. This problem has been studied quite extensively under the assumption the context is fully observed, as well as in the opposing limit when the context is unobserved, a special type of POMDP also referred to as a Latent MDP (LMDP). In this work, we consider a class of decision problems that interpolates between the settings, namely, between the case the context is fully observed, and the case the context is unobserved. We refer to this class of decision problems as LMDPs with prospective side information. In such an environment an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary POMDP settings and is not solved by RL algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least $\Omega(K^{2/3})$-regret, as opposed to standard $\Omega(\sqrt{K})$ lower bounds. We design an algorithm with a matching upper bound that depends only polynomially on the problem parameters. This establishes exponential improvement in the sample complexity relative to the existing LMDP lower bound, when prospective information is not given in prior work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kwon24a, title = {Prospective Side Information for Latent {MDP}s}, author = {Kwon, Jeongyeol and Efroni, Yonathan and Mannor, Shie and Caramanis, Constantine}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {25755--25783}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kwon24a/kwon24a.pdf}, url = {https://proceedings.mlr.press/v235/kwon24a.html}, abstract = {In many interactive decision-making problems, there is contextual side information that remains fixed within the course of an interaction. This problem has been studied quite extensively under the assumption the context is fully observed, as well as in the opposing limit when the context is unobserved, a special type of POMDP also referred to as a Latent MDP (LMDP). In this work, we consider a class of decision problems that interpolates between the settings, namely, between the case the context is fully observed, and the case the context is unobserved. We refer to this class of decision problems as LMDPs with prospective side information. In such an environment an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary POMDP settings and is not solved by RL algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least $\Omega(K^{2/3})$-regret, as opposed to standard $\Omega(\sqrt{K})$ lower bounds. We design an algorithm with a matching upper bound that depends only polynomially on the problem parameters. This establishes exponential improvement in the sample complexity relative to the existing LMDP lower bound, when prospective information is not given in prior work.} }
Endnote
%0 Conference Paper %T Prospective Side Information for Latent MDPs %A Jeongyeol Kwon %A Yonathan Efroni %A Shie Mannor %A Constantine Caramanis %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kwon24a %I PMLR %P 25755--25783 %U https://proceedings.mlr.press/v235/kwon24a.html %V 235 %X In many interactive decision-making problems, there is contextual side information that remains fixed within the course of an interaction. This problem has been studied quite extensively under the assumption the context is fully observed, as well as in the opposing limit when the context is unobserved, a special type of POMDP also referred to as a Latent MDP (LMDP). In this work, we consider a class of decision problems that interpolates between the settings, namely, between the case the context is fully observed, and the case the context is unobserved. We refer to this class of decision problems as LMDPs with prospective side information. In such an environment an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary POMDP settings and is not solved by RL algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least $\Omega(K^{2/3})$-regret, as opposed to standard $\Omega(\sqrt{K})$ lower bounds. We design an algorithm with a matching upper bound that depends only polynomially on the problem parameters. This establishes exponential improvement in the sample complexity relative to the existing LMDP lower bound, when prospective information is not given in prior work.
APA
Kwon, J., Efroni, Y., Mannor, S. & Caramanis, C.. (2024). Prospective Side Information for Latent MDPs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:25755-25783 Available from https://proceedings.mlr.press/v235/kwon24a.html.

Related Material