Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James Mcinerney, Dawen Liang, Nathan Kallus, Csaba Szepesvari
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:2135-2158, 2024.

Abstract

We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ayoub24a, title = {Switching the Loss Reduces the Cost in Batch Reinforcement Learning}, author = {Ayoub, Alex and Wang, Kaiwen and Liu, Vincent and Robertson, Samuel and Mcinerney, James and Liang, Dawen and Kallus, Nathan and Szepesvari, Csaba}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {2135--2158}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ayoub24a/ayoub24a.pdf}, url = {https://proceedings.mlr.press/v235/ayoub24a.html}, abstract = {We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.} }
Endnote
%0 Conference Paper %T Switching the Loss Reduces the Cost in Batch Reinforcement Learning %A Alex Ayoub %A Kaiwen Wang %A Vincent Liu %A Samuel Robertson %A James Mcinerney %A Dawen Liang %A Nathan Kallus %A Csaba Szepesvari %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ayoub24a %I PMLR %P 2135--2158 %U https://proceedings.mlr.press/v235/ayoub24a.html %V 235 %X We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
APA
Ayoub, A., Wang, K., Liu, V., Robertson, S., Mcinerney, J., Liang, D., Kallus, N. & Szepesvari, C.. (2024). Switching the Loss Reduces the Cost in Batch Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:2135-2158 Available from https://proceedings.mlr.press/v235/ayoub24a.html.

Related Material