Meta-Gradients in Non-Stationary Environments

Jelena Luketina; Sebastian Flennerhag; Yannick Schroecker; David Abel; Tom Zahavy; Satinder Singh

Meta-Gradients in Non-Stationary Environments

Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:886-901, 2022.

Abstract

Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhaget al., 2021; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance over a lifetime. We support these results with a qualitative analysis of resulting meta parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualizing meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v199-luketina22a,
  title = 	 {Meta-Gradients in Non-Stationary Environments},
  author =       {Luketina, Jelena and Flennerhag, Sebastian and Schroecker, Yannick and Abel, David and Zahavy, Tom and Singh, Satinder},
  booktitle = 	 {Proceedings of The 1st Conference on Lifelong Learning Agents},
  pages = 	 {886--901},
  year = 	 {2022},
  editor = 	 {Chandar, Sarath and Pascanu, Razvan and Precup, Doina},
  volume = 	 {199},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--24 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v199/luketina22a/luketina22a.pdf},
  url = 	 {https://proceedings.mlr.press/v199/luketina22a.html},
  abstract = 	 {Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhaget al., 2021; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance over a lifetime. We support these results with a qualitative analysis of resulting meta parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualizing meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.}
}

Endnote

%0 Conference Paper
%T Meta-Gradients in Non-Stationary Environments
%A Jelena Luketina
%A Sebastian Flennerhag
%A Yannick Schroecker
%A David Abel
%A Tom Zahavy
%A Satinder Singh
%B Proceedings of The 1st Conference on Lifelong Learning Agents
%C Proceedings of Machine Learning Research
%D 2022
%E Sarath Chandar
%E Razvan Pascanu
%E Doina Precup	
%F pmlr-v199-luketina22a
%I PMLR
%P 886--901
%U https://proceedings.mlr.press/v199/luketina22a.html
%V 199
%X Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhaget al., 2021; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance over a lifetime. We support these results with a qualitative analysis of resulting meta parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualizing meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.

APA


Luketina, J., Flennerhag, S., Schroecker, Y., Abel, D., Zahavy, T. & Singh, S.. (2022). Meta-Gradients in Non-Stationary Environments. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:886-901 Available from https://proceedings.mlr.press/v199/luketina22a.html.

Meta-Gradients in Non-Stationary Environments

Abstract

Cite this Paper

Related Material