Timing as an Action: Learning When to Observe and Act

Helen Zhou; Audrey Huang; Kamyar Azizzadenesheli; David Childers; Zachary Lipton

Timing as an Action: Learning When to Observe and Act

Helen Zhou, Audrey Huang, Kamyar Azizzadenesheli, David Childers, Zachary Lipton

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3979-3987, 2024.

Abstract

In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-zhou24c,
  title = 	 {Timing as an Action: Learning When to Observe and Act},
  author =       {Zhou, Helen and Huang, Audrey and Azizzadenesheli, Kamyar and Childers, David and Lipton, Zachary},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3979--3987},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/zhou24c/zhou24c.pdf},
  url = 	 {https://proceedings.mlr.press/v238/zhou24c.html},
  abstract = 	 {In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.}
}

Endnote

%0 Conference Paper
%T Timing as an Action: Learning When to Observe and Act
%A Helen Zhou
%A Audrey Huang
%A Kamyar Azizzadenesheli
%A David Childers
%A Zachary Lipton
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-zhou24c
%I PMLR
%P 3979--3987
%U https://proceedings.mlr.press/v238/zhou24c.html
%V 238
%X In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

APA

Zhou, H., Huang, A., Azizzadenesheli, K., Childers, D. & Lipton, Z.. (2024). Timing as an Action: Learning When to Observe and Act. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3979-3987 Available from https://proceedings.mlr.press/v238/zhou24c.html.

Timing as an Action: Learning When to Observe and Act

Abstract

Cite this Paper

Related Material