Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A.; Cheng Jie; Michael Fu; Steve Marcus; Csaba Szepesvari

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvari

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1406-1415, 2016.

Abstract

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v48-la16,
  title = 	 {Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control},
  author = 	 {L.A., Prashanth and Jie, Cheng and Fu, Michael and Marcus, Steve and Szepesvari, Csaba},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1406--1415},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/la16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/la16.html},
  abstract = 	 {Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.}
}

Endnote

%0 Conference Paper
%T Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
%A Prashanth L.A.
%A Cheng Jie
%A Michael Fu
%A Steve Marcus
%A Csaba Szepesvari
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-la16
%I PMLR
%P 1406--1415
%U https://proceedings.mlr.press/v48/la16.html
%V 48
%X Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.

RIS

TY  - CPAPER
TI  - Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
AU  - Prashanth L.A.
AU  - Cheng Jie
AU  - Michael Fu
AU  - Steve Marcus
AU  - Csaba Szepesvari
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-la16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1406
EP  - 1415
L1  - http://proceedings.mlr.press/v48/la16.pdf
UR  - https://proceedings.mlr.press/v48/la16.html
AB  - Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.
ER  -

APA

L.A., P., Jie, C., Fu, M., Marcus, S. & Szepesvari, C.. (2016). Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1406-1415 Available from https://proceedings.mlr.press/v48/la16.html.

Related Material

Download PDF