Concurrent Reinforcement Learning from Customer Interactions

David Silver; Leonard Newnham; David Barker; Suzanne Weller; Jason McFall

Concurrent Reinforcement Learning from Customer Interactions

David Silver, Leonard Newnham, David Barker, Suzanne Weller, Jason McFall

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):924-932, 2013.

Abstract

In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences. We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-silver13,
  title = 	 {Concurrent Reinforcement Learning from Customer Interactions},
  author = 	 {Silver, David and Newnham, Leonard and Barker, David and Weller, Suzanne and McFall, Jason},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {924--932},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/silver13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/silver13.html},
  abstract = 	 {In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences.   We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records. }
}

Endnote

%0 Conference Paper
%T Concurrent Reinforcement Learning from Customer Interactions
%A David Silver
%A Leonard Newnham
%A David Barker
%A Suzanne Weller
%A Jason McFall
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-silver13
%I PMLR
%P 924--932
%U https://proceedings.mlr.press/v28/silver13.html
%V 28
%N 3
%X In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences.   We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records.

RIS


TY  - CPAPER
TI  - Concurrent Reinforcement Learning from Customer Interactions
AU  - David Silver
AU  - Leonard Newnham
AU  - David Barker
AU  - Suzanne Weller
AU  - Jason McFall
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-silver13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 924
EP  - 932
L1  - http://proceedings.mlr.press/v28/silver13.pdf
UR  - https://proceedings.mlr.press/v28/silver13.html
AB  - In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences.   We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records. 
ER  -

APA


Silver, D., Newnham, L., Barker, D., Weller, S. & McFall, J.. (2013). Concurrent Reinforcement Learning from Customer Interactions. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):924-932 Available from https://proceedings.mlr.press/v28/silver13.html.

Related Material

Download PDF