The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang, Chu Wang, Warren Powell
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1138-1147, 2016.

Abstract

We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-wangb16, title = {The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks}, author = {Wang, Yingfei and Wang, Chu and Powell, Warren}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1138--1147}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/wangb16.pdf}, url = {https://proceedings.mlr.press/v48/wangb16.html}, abstract = {We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.} }
Endnote
%0 Conference Paper %T The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks %A Yingfei Wang %A Chu Wang %A Warren Powell %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-wangb16 %I PMLR %P 1138--1147 %U https://proceedings.mlr.press/v48/wangb16.html %V 48 %X We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.
RIS
TY - CPAPER TI - The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks AU - Yingfei Wang AU - Chu Wang AU - Warren Powell BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-wangb16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1138 EP - 1147 L1 - http://proceedings.mlr.press/v48/wangb16.pdf UR - https://proceedings.mlr.press/v48/wangb16.html AB - We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets. ER -
APA
Wang, Y., Wang, C. & Powell, W.. (2016). The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1138-1147 Available from https://proceedings.mlr.press/v48/wangb16.html.

Related Material