Adaptive Model Design for Markov Decision Process

Siyu Chen, Donglin Yang, Jiayang Li, Senmiao Wang, Zhuoran Yang, Zhaoran Wang
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:3679-3700, 2022.

Abstract

In a Markov decision process (MDP), an agent interacts with the environment via perceptions and actions. During this process, the agent aims to maximize its own gain. Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. We formulate this problem as a bilevel program, in which the lower-level MDP is regulated by the upper-level model designer. To solve the resulting problem, we develop a scheme that allows the designer to iteratively predict the agent’s reaction by solving the MDP and then adaptively update model parameters based on the predicted reaction. The algorithm is first theoretically analyzed and then empirically tested on several MDP models arising in economics and robotics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-chen22ab, title = {Adaptive Model Design for {M}arkov Decision Process}, author = {Chen, Siyu and Yang, Donglin and Li, Jiayang and Wang, Senmiao and Yang, Zhuoran and Wang, Zhaoran}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {3679--3700}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/chen22ab/chen22ab.pdf}, url = {https://proceedings.mlr.press/v162/chen22ab.html}, abstract = {In a Markov decision process (MDP), an agent interacts with the environment via perceptions and actions. During this process, the agent aims to maximize its own gain. Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. We formulate this problem as a bilevel program, in which the lower-level MDP is regulated by the upper-level model designer. To solve the resulting problem, we develop a scheme that allows the designer to iteratively predict the agent’s reaction by solving the MDP and then adaptively update model parameters based on the predicted reaction. The algorithm is first theoretically analyzed and then empirically tested on several MDP models arising in economics and robotics.} }
Endnote
%0 Conference Paper %T Adaptive Model Design for Markov Decision Process %A Siyu Chen %A Donglin Yang %A Jiayang Li %A Senmiao Wang %A Zhuoran Yang %A Zhaoran Wang %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-chen22ab %I PMLR %P 3679--3700 %U https://proceedings.mlr.press/v162/chen22ab.html %V 162 %X In a Markov decision process (MDP), an agent interacts with the environment via perceptions and actions. During this process, the agent aims to maximize its own gain. Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. We formulate this problem as a bilevel program, in which the lower-level MDP is regulated by the upper-level model designer. To solve the resulting problem, we develop a scheme that allows the designer to iteratively predict the agent’s reaction by solving the MDP and then adaptively update model parameters based on the predicted reaction. The algorithm is first theoretically analyzed and then empirically tested on several MDP models arising in economics and robotics.
APA
Chen, S., Yang, D., Li, J., Wang, S., Yang, Z. & Wang, Z.. (2022). Adaptive Model Design for Markov Decision Process. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:3679-3700 Available from https://proceedings.mlr.press/v162/chen22ab.html.

Related Material