Strength Through Diversity: Robust Behavior Learning via Mixture Policies

Tim Seyde, Wilko Schwarting, Igor Gilitschenski, Markus Wulfmeier, Daniela Rus
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1144-1155, 2022.

Abstract

Efficiency in robot learning is highly dependent on hyperparameters. Robot morphology and task structure differ widely and finding the optimal setting typically requires sequential or parallel repetition of experiments, strongly increasing the interaction count. We propose a training method that only relies on a single trial by enabling agents to select and combine controller designs conditioned on the task. Our Hyperparameter Mixture Policies (HMPs) feature diverse sub-policies that vary in distribution types and parameterization, reducing the impact of design choices and unlocking synergies between low-level components. We demonstrate strong performance on continuous control tasks, including a simulated ANYmal robot, showing that HMPs yield robust, data-efficient learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-seyde22a, title = {Strength Through Diversity: Robust Behavior Learning via Mixture Policies}, author = {Seyde, Tim and Schwarting, Wilko and Gilitschenski, Igor and Wulfmeier, Markus and Rus, Daniela}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1144--1155}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/seyde22a/seyde22a.pdf}, url = {https://proceedings.mlr.press/v164/seyde22a.html}, abstract = {Efficiency in robot learning is highly dependent on hyperparameters. Robot morphology and task structure differ widely and finding the optimal setting typically requires sequential or parallel repetition of experiments, strongly increasing the interaction count. We propose a training method that only relies on a single trial by enabling agents to select and combine controller designs conditioned on the task. Our Hyperparameter Mixture Policies (HMPs) feature diverse sub-policies that vary in distribution types and parameterization, reducing the impact of design choices and unlocking synergies between low-level components. We demonstrate strong performance on continuous control tasks, including a simulated ANYmal robot, showing that HMPs yield robust, data-efficient learning.} }
Endnote
%0 Conference Paper %T Strength Through Diversity: Robust Behavior Learning via Mixture Policies %A Tim Seyde %A Wilko Schwarting %A Igor Gilitschenski %A Markus Wulfmeier %A Daniela Rus %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-seyde22a %I PMLR %P 1144--1155 %U https://proceedings.mlr.press/v164/seyde22a.html %V 164 %X Efficiency in robot learning is highly dependent on hyperparameters. Robot morphology and task structure differ widely and finding the optimal setting typically requires sequential or parallel repetition of experiments, strongly increasing the interaction count. We propose a training method that only relies on a single trial by enabling agents to select and combine controller designs conditioned on the task. Our Hyperparameter Mixture Policies (HMPs) feature diverse sub-policies that vary in distribution types and parameterization, reducing the impact of design choices and unlocking synergies between low-level components. We demonstrate strong performance on continuous control tasks, including a simulated ANYmal robot, showing that HMPs yield robust, data-efficient learning.
APA
Seyde, T., Schwarting, W., Gilitschenski, I., Wulfmeier, M. & Rus, D.. (2022). Strength Through Diversity: Robust Behavior Learning via Mixture Policies. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1144-1155 Available from https://proceedings.mlr.press/v164/seyde22a.html.

Related Material