The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang; Chu Wang; Warren Powell

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang, Chu Wang, Warren Powell

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1138-1147, 2016.

Abstract

We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-wangb16,
  title = 	 {The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks},
  author = 	 {Wang, Yingfei and Wang, Chu and Powell, Warren},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1138--1147},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/wangb16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/wangb16.html},
  abstract = 	 {We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.}
}

Endnote

%0 Conference Paper
%T The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks
%A Yingfei Wang
%A Chu Wang
%A Warren Powell
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-wangb16
%I PMLR
%P 1138--1147
%U https://proceedings.mlr.press/v48/wangb16.html
%V 48
%X We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.

RIS


TY  - CPAPER
TI  - The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks
AU  - Yingfei Wang
AU  - Chu Wang
AU  - Warren Powell
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-wangb16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1138
EP  - 1147
L1  - http://proceedings.mlr.press/v48/wangb16.pdf
UR  - https://proceedings.mlr.press/v48/wangb16.html
AB  - We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.
ER  -

APA


Wang, Y., Wang, C. & Powell, W.. (2016). The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1138-1147 Available from https://proceedings.mlr.press/v48/wangb16.html.

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Abstract

Cite this Paper

Related Material