Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Neharika Jali, Guannan Qu, Weina Wang, Gauri Joshi
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4177-4185, 2024.

Abstract

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ${\sim}30%$ over the greedy policy that routes to the fastest available server.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-jali24a, title = { Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems }, author = {Jali, Neharika and Qu, Guannan and Wang, Weina and Joshi, Gauri}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4177--4185}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/jali24a/jali24a.pdf}, url = {https://proceedings.mlr.press/v238/jali24a.html}, abstract = { We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ${\sim}30%$ over the greedy policy that routes to the fastest available server. } }
Endnote
%0 Conference Paper %T Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems %A Neharika Jali %A Guannan Qu %A Weina Wang %A Gauri Joshi %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-jali24a %I PMLR %P 4177--4185 %U https://proceedings.mlr.press/v238/jali24a.html %V 238 %X We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ${\sim}30%$ over the greedy policy that routes to the fastest available server.
APA
Jali, N., Qu, G., Wang, W. & Joshi, G.. (2024). Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4177-4185 Available from https://proceedings.mlr.press/v238/jali24a.html.

Related Material