Timing as an Action: Learning When to Observe and Act

Helen Zhou, Audrey Huang, Kamyar Azizzadenesheli, David Childers, Zachary Lipton
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3979-3987, 2024.

Abstract

In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-zhou24c, title = {Timing as an Action: Learning When to Observe and Act}, author = {Zhou, Helen and Huang, Audrey and Azizzadenesheli, Kamyar and Childers, David and Lipton, Zachary}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3979--3987}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/zhou24c/zhou24c.pdf}, url = {https://proceedings.mlr.press/v238/zhou24c.html}, abstract = {In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.} }
Endnote
%0 Conference Paper %T Timing as an Action: Learning When to Observe and Act %A Helen Zhou %A Audrey Huang %A Kamyar Azizzadenesheli %A David Childers %A Zachary Lipton %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-zhou24c %I PMLR %P 3979--3987 %U https://proceedings.mlr.press/v238/zhou24c.html %V 238 %X In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.
APA
Zhou, H., Huang, A., Azizzadenesheli, K., Childers, D. & Lipton, Z.. (2024). Timing as an Action: Learning When to Observe and Act. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3979-3987 Available from https://proceedings.mlr.press/v238/zhou24c.html.

Related Material