Position: Automatic Environment Shaping is the Next Frontier in RL

Younghyo Park; Gabriel B. Margolis; Pulkit Agrawal

Position: Automatic Environment Shaping is the Next Frontier in RL

Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:39781-39792, 2024.

Abstract

Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It’s our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don’t tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-park24i,
  title = 	 {Position: Automatic Environment Shaping is the Next Frontier in {RL}},
  author =       {Park, Younghyo and Margolis, Gabriel B. and Agrawal, Pulkit},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {39781--39792},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/park24i/park24i.pdf},
  url = 	 {https://proceedings.mlr.press/v235/park24i.html},
  abstract = 	 {Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It’s our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don’t tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.}
}

Endnote

%0 Conference Paper
%T Position: Automatic Environment Shaping is the Next Frontier in RL
%A Younghyo Park
%A Gabriel B. Margolis
%A Pulkit Agrawal
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-park24i
%I PMLR
%P 39781--39792
%U https://proceedings.mlr.press/v235/park24i.html
%V 235
%X Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It’s our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don’t tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.

APA


Park, Y., Margolis, G.B. & Agrawal, P.. (2024). Position: Automatic Environment Shaping is the Next Frontier in RL. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:39781-39792 Available from https://proceedings.mlr.press/v235/park24i.html.

Position: Automatic Environment Shaping is the Next Frontier in RL

Abstract

Cite this Paper

Related Material