Flow-based Domain Randomization for Learning and Sequencing Robotic Skills

Aidan Curtis, Eric Li, Michael Noseworthy, Nishad Gothoskar, Sachin Chitta, Hui Li, Leslie Pack Kaelbling, Nicole E Carey
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11692-11709, 2025.

Abstract

Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies learned in simulation. By randomizing properties of the environment during training, the learned policy can be robust to uncertainty along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate the problem of automatically discovering this sampling distribution via entropy-regularized reward maximization of a neural sampling distribution in the form of a normalizing flow. We show that this architecture is more flexible and results in better robustness than existing approaches to learning simple parameterized sampling distributions. We demonstrate that these policies can be used to learn robust policies for contact-rich assembly tasks. Additionally, we explore how these sampling distributions, in combination with a privileged value function, can be used for out-of-distribution detection in the context of an uncertainty-aware multi-step manipulation planner.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-curtis25a, title = {Flow-based Domain Randomization for Learning and Sequencing Robotic Skills}, author = {Curtis, Aidan and Li, Eric and Noseworthy, Michael and Gothoskar, Nishad and Chitta, Sachin and Li, Hui and Kaelbling, Leslie Pack and Carey, Nicole E}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11692--11709}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/curtis25a/curtis25a.pdf}, url = {https://proceedings.mlr.press/v267/curtis25a.html}, abstract = {Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies learned in simulation. By randomizing properties of the environment during training, the learned policy can be robust to uncertainty along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate the problem of automatically discovering this sampling distribution via entropy-regularized reward maximization of a neural sampling distribution in the form of a normalizing flow. We show that this architecture is more flexible and results in better robustness than existing approaches to learning simple parameterized sampling distributions. We demonstrate that these policies can be used to learn robust policies for contact-rich assembly tasks. Additionally, we explore how these sampling distributions, in combination with a privileged value function, can be used for out-of-distribution detection in the context of an uncertainty-aware multi-step manipulation planner.} }
Endnote
%0 Conference Paper %T Flow-based Domain Randomization for Learning and Sequencing Robotic Skills %A Aidan Curtis %A Eric Li %A Michael Noseworthy %A Nishad Gothoskar %A Sachin Chitta %A Hui Li %A Leslie Pack Kaelbling %A Nicole E Carey %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-curtis25a %I PMLR %P 11692--11709 %U https://proceedings.mlr.press/v267/curtis25a.html %V 267 %X Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies learned in simulation. By randomizing properties of the environment during training, the learned policy can be robust to uncertainty along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate the problem of automatically discovering this sampling distribution via entropy-regularized reward maximization of a neural sampling distribution in the form of a normalizing flow. We show that this architecture is more flexible and results in better robustness than existing approaches to learning simple parameterized sampling distributions. We demonstrate that these policies can be used to learn robust policies for contact-rich assembly tasks. Additionally, we explore how these sampling distributions, in combination with a privileged value function, can be used for out-of-distribution detection in the context of an uncertainty-aware multi-step manipulation planner.
APA
Curtis, A., Li, E., Noseworthy, M., Gothoskar, N., Chitta, S., Li, H., Kaelbling, L.P. & Carey, N.E.. (2025). Flow-based Domain Randomization for Learning and Sequencing Robotic Skills. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11692-11709 Available from https://proceedings.mlr.press/v267/curtis25a.html.

Related Material