CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Abdus Salam Azad; Izzeddin Gur; Jasper Emhoff; Nathaniel Alexis; Aleksandra Faust; Pieter Abbeel; Ion Stoica

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1361-1395, 2023.

Abstract

Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-azad23a,
  title = 	 {{CLUTR}: Curriculum Learning via Unsupervised Task Representation Learning},
  author =       {Azad, Abdus Salam and Gur, Izzeddin and Emhoff, Jasper and Alexis, Nathaniel and Faust, Aleksandra and Abbeel, Pieter and Stoica, Ion},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {1361--1395},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/azad23a/azad23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/azad23a.html},
  abstract = 	 {Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.}
}

Endnote

%0 Conference Paper
%T CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
%A Abdus Salam Azad
%A Izzeddin Gur
%A Jasper Emhoff
%A Nathaniel Alexis
%A Aleksandra Faust
%A Pieter Abbeel
%A Ion Stoica
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-azad23a
%I PMLR
%P 1361--1395
%U https://proceedings.mlr.press/v202/azad23a.html
%V 202
%X Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.

APA


Azad, A.S., Gur, I., Emhoff, J., Alexis, N., Faust, A., Abbeel, P. & Stoica, I.. (2023). CLUTR: Curriculum Learning via Unsupervised Task Representation Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1361-1395 Available from https://proceedings.mlr.press/v202/azad23a.html.

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Abstract

Cite this Paper

Related Material