Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems

Ibtihal El Mimouni; Konstantin Avrachenkov

Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems

Ibtihal El Mimouni, Konstantin Avrachenkov

Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), PMLR 265:176-183, 2025.

Abstract

In this paper, we introduce DQWIC, a novel algorithm that combines Deep Reinforcement Learning and Whittle index theory within the Contextual Restless Multi-Armed Bandit framework for the discounted criterion. DQWIC is designed to learn in evolving environments typical of real-world applications, such as recommender systems, where user preferences and environmental dynamics evolve over time. In particular, we apply DQWIC to the problem of optimizing email recommendations, where it tackles the dual challenges of enhancing content relevance and reducing spam messages, thereby addressing ethical concerns related to intrusive emailing. The algorithm leverages two neural networks: a Q-network for approximating action-value functions and a Whittle-network for estimating Whittle indices, both of which integrate contextual features to inform decision-making. In addition, the inclusion of context allows us to handle many heterogeneous users in a scalable way. The learning process occurs through a two time scale stochastic approximation, with the Q-network updated frequently to minimize the loss between predicted and target Q-values, and the Whittle-network updated on a slower time scale. To evaluate its effectiveness, we conducted experiments in partnership with a company specializing in digital marketing. Our results, derived from both synthetic and real-world data, show that DQWIC outperforms existing email marketing baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v265-mimouni25a,
  title = 	 {Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems},
  author =       {Mimouni, Ibtihal El and Avrachenkov, Konstantin},
  booktitle = 	 {Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)},
  pages = 	 {176--183},
  year = 	 {2025},
  editor = 	 {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin},
  volume = 	 {265},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--09 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v265/main/assets/mimouni25a/mimouni25a.pdf},
  url = 	 {https://proceedings.mlr.press/v265/mimouni25a.html},
  abstract = 	 {In this paper, we introduce DQWIC, a novel algorithm that combines Deep Reinforcement Learning and Whittle index theory within the Contextual Restless Multi-Armed Bandit framework for the discounted criterion. DQWIC is designed to learn in evolving environments typical of real-world applications, such as recommender systems, where  user preferences and environmental dynamics evolve over time. In particular, we apply DQWIC to the problem of optimizing email recommendations, where it tackles the dual challenges of enhancing content relevance and reducing spam messages, thereby addressing ethical concerns related to intrusive emailing. The algorithm leverages two neural networks: a Q-network for approximating action-value functions and a Whittle-network for estimating Whittle indices, both of which integrate contextual features to inform decision-making. In addition, the inclusion of context allows us to handle many heterogeneous users in a scalable way. The learning process occurs through a two time scale stochastic approximation, with the Q-network updated frequently to minimize the loss between predicted and target Q-values, and the Whittle-network updated on a slower time scale. To evaluate its effectiveness, we conducted experiments in partnership with a company specializing in digital marketing. Our results, derived from both synthetic and real-world data, show that DQWIC outperforms existing email marketing baselines.}
}

Endnote

%0 Conference Paper
%T Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems
%A Ibtihal El Mimouni
%A Konstantin Avrachenkov
%B Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)
%C Proceedings of Machine Learning Research
%D 2025
%E Tetiana Lutchyn
%E Adín Ramírez Rivera
%E Benjamin Ricaud	
%F pmlr-v265-mimouni25a
%I PMLR
%P 176--183
%U https://proceedings.mlr.press/v265/mimouni25a.html
%V 265
%X In this paper, we introduce DQWIC, a novel algorithm that combines Deep Reinforcement Learning and Whittle index theory within the Contextual Restless Multi-Armed Bandit framework for the discounted criterion. DQWIC is designed to learn in evolving environments typical of real-world applications, such as recommender systems, where  user preferences and environmental dynamics evolve over time. In particular, we apply DQWIC to the problem of optimizing email recommendations, where it tackles the dual challenges of enhancing content relevance and reducing spam messages, thereby addressing ethical concerns related to intrusive emailing. The algorithm leverages two neural networks: a Q-network for approximating action-value functions and a Whittle-network for estimating Whittle indices, both of which integrate contextual features to inform decision-making. In addition, the inclusion of context allows us to handle many heterogeneous users in a scalable way. The learning process occurs through a two time scale stochastic approximation, with the Q-network updated frequently to minimize the loss between predicted and target Q-values, and the Whittle-network updated on a slower time scale. To evaluate its effectiveness, we conducted experiments in partnership with a company specializing in digital marketing. Our results, derived from both synthetic and real-world data, show that DQWIC outperforms existing email marketing baselines.

APA

Mimouni, I.E. & Avrachenkov, K.. (2025). Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems. Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 265:176-183 Available from https://proceedings.mlr.press/v265/mimouni25a.html.

Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems

Abstract

Cite this Paper

Related Material