Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

Qiankun Zhang; Aocheng Shen; Boyu Zhang; Hanrui Jiang; Bingqian Du

Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

Qiankun Zhang, Aocheng Shen, Boyu Zhang, Hanrui Jiang, Bingqian Du

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59806-59826, 2024.

Abstract

For a specific online optimization problem, for example, online bipartite matching (OBM), research efforts could be made in two directions before it is finally closed, i.e., the optimal competitive online algorithm is found. One is to continuously design algorithms with better performance. To this end, reinforcement learning (RL) has demonstrated great success in literature. However, little is known on the other direction: whether RL helps explore how hard an online problem is. In this paper, we study a generalized model of OBM, named online matching with stochastic rewards (OMSR, FOCS 2012), for which the optimal competitive ratio is still unknown. We adopt an adversarial RL approach that trains two RL agents adversarially and iteratively: the algorithm agent learns for algorithms with larger competitive ratios, while the adversarial agent learns to produce a family of hard instances. Through such a framework, agents converge at the end with a robust algorithm, which empirically outperforms the state of the art (STOC 2020). Much more significantly, it allows to track how the hard instances are generated. We succeed in distilling two structural properties from the learned graph patterns, which remarkably reduce the action space, and further enable theoretical improvement on the best-known hardness result of OMSR, from

$0.621$ (FOCS 2012) to

$0.597$ . To the best of our knowledge, this gives the first evidence that RL can help enhance the theoretical understanding of an online problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-zhang24bf,
  title = 	 {Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning},
  author =       {Zhang, Qiankun and Shen, Aocheng and Zhang, Boyu and Jiang, Hanrui and Du, Bingqian},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {59806--59826},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24bf/zhang24bf.pdf},
  url = 	 {https://proceedings.mlr.press/v235/zhang24bf.html},
  abstract = 	 {For a specific online optimization problem, for example, online bipartite matching (OBM), research efforts could be made in two directions before it is finally closed, i.e., the optimal competitive online algorithm is found. One is to continuously design algorithms with better performance. To this end, reinforcement learning (RL) has demonstrated great success in literature. However, little is known on the other direction: whether RL helps explore how hard an online problem is. In this paper, we study a generalized model of OBM, named online matching with stochastic rewards (OMSR, FOCS 2012), for which the optimal competitive ratio is still unknown. We adopt an adversarial RL approach that trains two RL agents adversarially and iteratively: the algorithm agent learns for algorithms with larger competitive ratios, while the adversarial agent learns to produce a family of hard instances. Through such a framework, agents converge at the end with a robust algorithm, which empirically outperforms the state of the art (STOC 2020). Much more significantly, it allows to track how the hard instances are generated. We succeed in distilling two structural properties from the learned graph patterns, which remarkably reduce the action space, and further enable theoretical improvement on the best-known hardness result of OMSR, from $0.621$ (FOCS 2012) to $0.597$. To the best of our knowledge, this gives the first evidence that RL can help enhance the theoretical understanding of an online problem.}
}

Endnote

%0 Conference Paper
%T Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning
%A Qiankun Zhang
%A Aocheng Shen
%A Boyu Zhang
%A Hanrui Jiang
%A Bingqian Du
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-zhang24bf
%I PMLR
%P 59806--59826
%U https://proceedings.mlr.press/v235/zhang24bf.html
%V 235
%X For a specific online optimization problem, for example, online bipartite matching (OBM), research efforts could be made in two directions before it is finally closed, i.e., the optimal competitive online algorithm is found. One is to continuously design algorithms with better performance. To this end, reinforcement learning (RL) has demonstrated great success in literature. However, little is known on the other direction: whether RL helps explore how hard an online problem is. In this paper, we study a generalized model of OBM, named online matching with stochastic rewards (OMSR, FOCS 2012), for which the optimal competitive ratio is still unknown. We adopt an adversarial RL approach that trains two RL agents adversarially and iteratively: the algorithm agent learns for algorithms with larger competitive ratios, while the adversarial agent learns to produce a family of hard instances. Through such a framework, agents converge at the end with a robust algorithm, which empirically outperforms the state of the art (STOC 2020). Much more significantly, it allows to track how the hard instances are generated. We succeed in distilling two structural properties from the learned graph patterns, which remarkably reduce the action space, and further enable theoretical improvement on the best-known hardness result of OMSR, from $0.621$ (FOCS 2012) to $0.597$. To the best of our knowledge, this gives the first evidence that RL can help enhance the theoretical understanding of an online problem.

APA


Zhang, Q., Shen, A., Zhang, B., Jiang, H. & Du, B.. (2024). Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59806-59826 Available from https://proceedings.mlr.press/v235/zhang24bf.html.

Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

Abstract

Cite this Paper

Related Material