TACO: Learning Task Decomposition via Temporal Alignment for Control

Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4654-4663, 2018.

Abstract

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, we can provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-shiarlis18a, title = {{TACO}: Learning Task Decomposition via Temporal Alignment for Control}, author = {Shiarlis, Kyriacos and Wulfmeier, Markus and Salter, Sasha and Whiteson, Shimon and Posner, Ingmar}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4654--4663}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/shiarlis18a/shiarlis18a.pdf}, url = {http://proceedings.mlr.press/v80/shiarlis18a.html}, abstract = {Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, we can provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.} }
Endnote
%0 Conference Paper %T TACO: Learning Task Decomposition via Temporal Alignment for Control %A Kyriacos Shiarlis %A Markus Wulfmeier %A Sasha Salter %A Shimon Whiteson %A Ingmar Posner %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-shiarlis18a %I PMLR %P 4654--4663 %U http://proceedings.mlr.press/v80/shiarlis18a.html %V 80 %X Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, we can provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.
APA
Shiarlis, K., Wulfmeier, M., Salter, S., Whiteson, S. & Posner, I.. (2018). TACO: Learning Task Decomposition via Temporal Alignment for Control. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4654-4663 Available from http://proceedings.mlr.press/v80/shiarlis18a.html.

Related Material