Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting

Vincent Létourneau, Colin Bellinger, Isaac Tamblyn, Maia Fraser
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:470-480, 2023.

Abstract

This conceptual paper provides theoretical results linking notions in semi-supervised learning (SSL) and hierarchical reinforcement learning (HRL) in the context of lifelong learning. Specifically, our construction sets up a direct analogy between intermediate representations in SSL and temporal abstraction in RL, highlighting the important role of factorization in both types of hierarchy and the relevance of partial labeling, resp. partial observation. The construction centres around a simple class of Partially Observed Markov Decision Processes (POMDPs) where we show tools and results from SSL imply lower bounds on regret holding for any RL algorithm without access to temporal abstraction. While our lower bound is for a restricted class of RL problems, it applies to arbitrary RL algorithms in this setting. The setting moreover features so-called “active measuring”, an aspect of widespread relevance in industrial control, but - possibly due to its lifelong learning flavour - not yet well-studied in RL. Our formalization makes it possible to think about tradeoffs that apply for such control problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-letourneau23a, title = {Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting}, author = {L\'etourneau, Vincent and Bellinger, Colin and Tamblyn, Isaac and Fraser, Maia}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {470--480}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/letourneau23a/letourneau23a.pdf}, url = {https://proceedings.mlr.press/v232/letourneau23a.html}, abstract = {This conceptual paper provides theoretical results linking notions in semi-supervised learning (SSL) and hierarchical reinforcement learning (HRL) in the context of lifelong learning. Specifically, our construction sets up a direct analogy between intermediate representations in SSL and temporal abstraction in RL, highlighting the important role of factorization in both types of hierarchy and the relevance of partial labeling, resp. partial observation. The construction centres around a simple class of Partially Observed Markov Decision Processes (POMDPs) where we show tools and results from SSL imply lower bounds on regret holding for any RL algorithm without access to temporal abstraction. While our lower bound is for a restricted class of RL problems, it applies to arbitrary RL algorithms in this setting. The setting moreover features so-called “active measuring”, an aspect of widespread relevance in industrial control, but - possibly due to its lifelong learning flavour - not yet well-studied in RL. Our formalization makes it possible to think about tradeoffs that apply for such control problems.} }
Endnote
%0 Conference Paper %T Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting %A Vincent Létourneau %A Colin Bellinger %A Isaac Tamblyn %A Maia Fraser %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-letourneau23a %I PMLR %P 470--480 %U https://proceedings.mlr.press/v232/letourneau23a.html %V 232 %X This conceptual paper provides theoretical results linking notions in semi-supervised learning (SSL) and hierarchical reinforcement learning (HRL) in the context of lifelong learning. Specifically, our construction sets up a direct analogy between intermediate representations in SSL and temporal abstraction in RL, highlighting the important role of factorization in both types of hierarchy and the relevance of partial labeling, resp. partial observation. The construction centres around a simple class of Partially Observed Markov Decision Processes (POMDPs) where we show tools and results from SSL imply lower bounds on regret holding for any RL algorithm without access to temporal abstraction. While our lower bound is for a restricted class of RL problems, it applies to arbitrary RL algorithms in this setting. The setting moreover features so-called “active measuring”, an aspect of widespread relevance in industrial control, but - possibly due to its lifelong learning flavour - not yet well-studied in RL. Our formalization makes it possible to think about tradeoffs that apply for such control problems.
APA
Létourneau, V., Bellinger, C., Tamblyn, I. & Fraser, M.. (2023). Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:470-480 Available from https://proceedings.mlr.press/v232/letourneau23a.html.

Related Material