- title: 'Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks' abstract: 'In this paper, we study the problem of learning a repertoire of low-level skills from raw images that can be sequenced to complete long-horizon visuomotor tasks. Reinforcement learning (RL) is a promising approach for acquiring short-horizon skills autonomously. However, the focus of RL algorithms has largely been on the success of those individual skills, more so than learning and grounding a large repertoire of skills that can be sequenced to complete extended multi-stage tasks. The latter demands robustness and persistence, as errors in skills can compound over time, and may require the robot to have a number of primitive skills in its repertoire, rather than just one. To this end, we introduce EMBR, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks. EMBR learns and plans using a learned model, critic, and success classifier, where the success classifier serves both as a reward function for RL and as a grounding mechanism to continuously detect if the robot should retry a skill when unsuccessful or under perturbations. Further, the learned model is task-agnostic and trained using data from all skills, enabling the robot to efficiently learn a number of distinct primitives. These visuomotor primitive skills and their associated pre- and post-conditions can then be directly combined with off-the-shelf symbolic planners to complete long-horizon tasks. On a Franka Emika robot arm, we find that EMBR enables the robot to complete three long-horizon visuomotor tasks at 85% success rate, such as organizing an office desk, a file cabinet, and drawers, which require sequencing up to 12 skills, involve 14 unique learned primitives, and demand generalization to novel objects.' volume: 164 URL: https://proceedings.mlr.press/v164/wu22a.html PDF: https://proceedings.mlr.press/v164/wu22a/wu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Bohan family: Wu - given: Suraj family: Nair - given: Li family: Fei-Fei - given: Chelsea family: Finn editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1-13 id: wu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1 lastpage: 13 published: 2022-01-11 00:00:00 +0000 - title: 'Tactile Image-to-Image Disentanglement of Contact Geometry from Motion-Induced Shear' abstract: 'Robotic touch, particularly when using soft optical tactile sensors, suffers from distortion caused by motion-dependent shear. The manner in which the sensor contacts a stimulus is entangled with the tactile information about the stimulus geometry. In this work, we propose a supervised convolutional deep neural network model that learns to disentangle, in the latent space, the components of sensor deformations caused by contact geometry from those due to sliding-induced shear. The approach is validated by showing a close match between the unsheared images reconstructed from sheared images and their vertical tap (non-sheared) counterparts. In addition, the unsheared tactile images faithfully reconstruct the contact geometry masked in sheared data, and allow robust estimation of the contact pose of use for sliding exploration of various planar shapes. Overall, the contact geometry reconstruction in conjunction with sliding exploration were used for faithful full object reconstruction of various planar shapes. The methods have broad applicability to deep learning models for robots with a shear-sensitive sense of touch.' volume: 164 URL: https://proceedings.mlr.press/v164/gupta22a.html PDF: https://proceedings.mlr.press/v164/gupta22a/gupta22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-gupta22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Anupam K. family: Gupta - given: Laurence family: Aitchison - given: Nathan F. family: Lepora editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 14-23 id: gupta22a issued: date-parts: - 2022 - 1 - 11 firstpage: 14 lastpage: 23 published: 2022-01-11 00:00:00 +0000 - title: 'FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding' abstract: 'High-velocity dynamic actions (e.g., fling or throw) play a crucial role in our everyday interaction with deformable objects by improving our efficiency and effectively expanding our physical reach range. Yet, most prior works have tackled cloth manipulation using exclusively single-arm quasi-static actions, which requires a large number of interactions for challenging initial cloth configurations and strictly limits the maximum cloth size by the robot’s reach range. In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations. The final system achieves over 80% coverage within 3 actions on novel cloths, can unfold cloths larger than the system’s reach range, and generalizes to T-shirts despite being trained on only rectangular cloths. We also finetuned FlingBot on a real-world dual-arm robot platform, where it increased the cloth coverage over 4 times more than the quasi-static baseline did. The simplicity of FlingBot combined with its superior performance over quasi-static baselines demonstrates the effectiveness of dynamic actions for deformable object manipulation.' volume: 164 URL: https://proceedings.mlr.press/v164/ha22a.html PDF: https://proceedings.mlr.press/v164/ha22a/ha22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-ha22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Huy family: Ha - given: Shuran family: Song editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 24-33 id: ha22a issued: date-parts: - 2022 - 1 - 11 firstpage: 24 lastpage: 33 published: 2022-01-11 00:00:00 +0000 - title: 'TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo' abstract: 'In this paper, we present TANDEM a real-time monocular tracking and dense mapping framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of keyframes. To increase the robustness, we propose a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model that is built incrementally from dense depth predictions. To predict the dense depth maps, we propose Cascade View-Aggregation MVSNet (CVA-MVSNet) that utilizes the entire active keyframe window by hierarchically constructing 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Finally, the predicted depth maps are fused into a consistent global map represented as a truncated signed distance function (TSDF) voxel grid. Our experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry (VO) methods in terms of camera tracking. Moreover, TANDEM shows state-of-the-art real-time 3D reconstruction performance. Webpage: https://go.vision.in.tum.de/tandem' volume: 164 URL: https://proceedings.mlr.press/v164/koestler22a.html PDF: https://proceedings.mlr.press/v164/koestler22a/koestler22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-koestler22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Lukas family: Koestler - given: Nan family: Yang - given: Niclas family: Zeller - given: Daniel family: Cremers editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 34-45 id: koestler22a issued: date-parts: - 2022 - 1 - 11 firstpage: 34 lastpage: 45 published: 2022-01-11 00:00:00 +0000 - title: 'Taskography: Evaluating robot task planning over large 3D scene graphs' abstract: '3D scene graphs (3DSGs) are an emerging description; unifying symbolic, topological, and metric scene representations. However, typical 3DSGs contain hundreds of objects and symbols even for small environments; rendering task planning on the \emph{full} graph impractical. We construct \textbf{Taskography}, the first large-scale robotic task planning benchmark over 3DSGs. While most benchmarking efforts in this area focus on \emph{vision-based planning}, we systematically study \emph{symbolic} planning, to decouple planning performance from visual representation learning. We observe that, among existing methods, neither classical nor learning-based planners are capable of real-time planning over \emph{full} 3DSGs. Enabling real-time planning demands progress on \emph{both} (a) sparsifying 3DSGs for tractable planning and (b) designing planners that better exploit 3DSG hierarchies. Towards the former goal, we propose \textbf{Scrub}, a task-conditioned 3DSG sparsification method; enabling classical planners to match (and surpass) state-of-the-art learning-based planners. Towards the latter goal, we propose \textbf{Seek}, a procedure enabling learning-based planners to exploit 3DSG structure, reducing the number of replanning queries required by current best approaches by an order of magnitude. We will open-source all code and baselines to spur further research along the intersections of robot task planning, learning and 3DSGs.' volume: 164 URL: https://proceedings.mlr.press/v164/agia22a.html PDF: https://proceedings.mlr.press/v164/agia22a/agia22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-agia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Christopher family: Agia - given: Krishna Murthy family: Jatavallabhula - given: Mohamed family: Khodeir - given: Ondrej family: Miksik - given: Vibhav family: Vineet - given: Mustafa family: Mukadam - given: Liam family: Paull - given: Florian family: Shkurti editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 46-58 id: agia22a issued: date-parts: - 2022 - 1 - 11 firstpage: 46 lastpage: 58 published: 2022-01-11 00:00:00 +0000 - title: 'Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning' abstract: 'Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of small, topologically complex goal regions, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.' volume: 164 URL: https://proceedings.mlr.press/v164/ichter22a.html PDF: https://proceedings.mlr.press/v164/ichter22a/ichter22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-ichter22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: brian family: ichter - given: Pierre family: Sermanet - given: Corey family: Lynch editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 59-69 id: ichter22a issued: date-parts: - 2022 - 1 - 11 firstpage: 59 lastpage: 69 published: 2022-01-11 00:00:00 +0000 - title: 'Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds' abstract: '6D robotic grasping beyond top-down bin-picking scenarios is a challenging task. Previous solutions based on 6D grasp synthesis with robot motion planning usually operate in an open-loop setting, which are sensitive to grasp synthesis errors. In this work, we propose a new method for learning closed-loop control policies for 6D grasping. Our policy takes a segmented point cloud of an object from an egocentric camera as input, and outputs continuous 6D control actions of the robot gripper for grasping the object. We combine imitation learning and reinforcement learning and introduce a goal-auxiliary actor-critic algorithm for policy learning. We demonstrate that our learned policy can be integrated into a tabletop 6D grasping system and a human-robot handover system to improve the grasping performance of unseen objects. Videos and code are available at https://sites.google.com/view/gaddpg.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22a.html PDF: https://proceedings.mlr.press/v164/wang22a/wang22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Lirui family: Wang - given: Yu family: Xiang - given: Wei family: Yang - given: Arsalan family: Mousavian - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 70-80 id: wang22a issued: date-parts: - 2022 - 1 - 11 firstpage: 70 lastpage: 80 published: 2022-01-11 00:00:00 +0000 - title: 'Parallelised Diffeomorphic Sampling-based Motion Planning' abstract: 'We propose Parallelised Diffeomorphic Sampling-based Motion Planning (PDMP). PDMP is a novel parallelised framework that uses bijective and differentiable mappings, or diffeomorphisms, to transform sampling distributions of sampling-based motion planners, in a manner akin to normalising flows. Unlike normalising flow models which use invertible neural network structures to represent these diffeomorphisms, we develop them from gradient information of desired costs, and encode desirable behaviour, such as obstacle avoidance. These transformed sampling distributions can then be used for sampling-based motion planning. A particular example is when we wish to imbue the sampling distribution with knowledge of the environment geometry, such that drawn samples are less prone to be in collision. To this end, we propose to learn a continuous occupancy representation from environment occupancy data, such that gradients of the representation defines a valid diffeomorphism and is amenable to fast parallelise evaluation. We use this to “morph” the sampling distribution to draw far less collision-prone samples. PDMP is able to leverage gradient information of costs, to inject specifications, in a manner similar to optimisation-based motion planning methods, but relies on drawing from a sampling distribution, retaining the tendency to find more global solutions, thereby bridging the gap between trajectory optimisation and sampling-based planning methods.' volume: 164 URL: https://proceedings.mlr.press/v164/lai22a.html PDF: https://proceedings.mlr.press/v164/lai22a/lai22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tin family: Lai - given: Weiming family: Zhi - given: Tucker family: Hermans - given: Fabio family: Ramos editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 81-90 id: lai22a issued: date-parts: - 2022 - 1 - 11 firstpage: 81 lastpage: 90 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning' abstract: 'In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged_gym/.' volume: 164 URL: https://proceedings.mlr.press/v164/rudin22a.html PDF: https://proceedings.mlr.press/v164/rudin22a/rudin22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-rudin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Nikita family: Rudin - given: David family: Hoeller - given: Philipp family: Reist - given: Marco family: Hutter editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 91-100 id: rudin22a issued: date-parts: - 2022 - 1 - 11 firstpage: 91 lastpage: 100 published: 2022-01-11 00:00:00 +0000 - title: 'Using Physics Knowledge for Learning Rigid-body Forward Dynamics with Gaussian Process Force Priors' abstract: 'If a robot’s dynamics are difficult to model solely through analytical mechanics, it is an attractive option to directly learn it from data. Yet, solely data-driven approaches require considerable amounts of data for training and do not extrapolate well to unseen regions of the system’s state space. In this work, we emphasize that when a robot’s links are sufficiently rigid, many analytical functions such as kinematics, inertia functions, and surface constraints encode informative prior knowledge on its dynamics. To this effect, we propose a framework for learning probabilistic forward dynamics that combines physics knowledge with Gaussian processes utilizing automatic differentiation with GPU acceleration. Compared to solely data-driven modeling, the model’s data efficiency improves while the model also respects physical constraints. We illustrate the proposed structured model on a seven joint robot arm in PyBullet. Our implementation of the proposed framework can be found here: https://git.io/JP4Fs' volume: 164 URL: https://proceedings.mlr.press/v164/rath22a.html PDF: https://proceedings.mlr.press/v164/rath22a/rath22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-rath22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Lucas family: Rath - given: Andreas René family: Geist - given: Sebastian family: Trimpe editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 101-111 id: rath22a issued: date-parts: - 2022 - 1 - 11 firstpage: 101 lastpage: 111 published: 2022-01-11 00:00:00 +0000 - title: '3D Neural Scene Representations for Visuomotor Control' abstract: 'Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far beyond the reach of current robots. In this work, we desire to learn models for dynamic 3D scenes purely from 2D visual observations. Our model combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations. We show that a dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks involving both rigid bodies and fluids, where the target is specified in a viewpoint different from what the robot operates on. When coupled with an auto-decoding framework, it can even support goal specification from camera viewpoints that are outside the training distribution. We further demonstrate the richness of the learned 3D dynamics model by performing future prediction and novel view synthesis. Finally, we provide detailed ablation studies regarding different system designs and qualitative analysis of the learned representations.' volume: 164 URL: https://proceedings.mlr.press/v164/li22a.html PDF: https://proceedings.mlr.press/v164/li22a/li22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-li22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yunzhu family: Li - given: Shuang family: Li - given: Vincent family: Sitzmann - given: Pulkit family: Agrawal - given: Antonio family: Torralba editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 112-123 id: li22a issued: date-parts: - 2022 - 1 - 11 firstpage: 112 lastpage: 123 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Density Distribution of Reachable States for Autonomous Systems' abstract: 'State density distribution, in contrast to worst-case reachability, can be leveraged for safety-related problems to better quantify the likelihood of the risk for potentially hazardous situations. In this work, we propose a data-driven method to compute the density distribution of reachable states for nonlinear and even black-box systems. Our semi-supervised approach learns system dynamics and the state density jointly from trajectory data, guided by the fact that the state density evolution follows the Liouville partial differential equation. With the help of neural network reachability tools, our approach can estimate the set of all possible future states as well as their density. Moreover, we could perform online safety verification with probability ranges for unsafe behaviors to occur. We use an extensive set of experiments to show that our learned solution can produce a much more accurate estimate on density distribution, and can quantify risks less conservatively and flexibly comparing with worst-case analysis.' volume: 164 URL: https://proceedings.mlr.press/v164/meng22a.html PDF: https://proceedings.mlr.press/v164/meng22a/meng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-meng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yue family: Meng - given: Dawei family: Sun - given: Zeng family: Qiu - given: Md Tawhid Bin family: Waez - given: Chuchu family: Fan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 124-136 id: meng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 124 lastpage: 136 published: 2022-01-11 00:00:00 +0000 - title: 'A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation' abstract: 'Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills. However, reasoning over mechanics and dexterous interactions can make it challenging to scale visual learning for contact-rich manipulation. In this work, we focus on the problem of visual dexterous planar manipulation: given a video of an object in planar motion, find contact-aware robot actions that reproduce the same object motion. We propose a novel learning architecture that combines video decoding neural models with priors from contact mechanics by leveraging differentiable optimization and differentiable simulation. Through extensive simulated experiments, we investigate the interplay between traditional model-based techniques and modern deep learning approaches. We find that our modular and fully differentiable architecture outperforms learning-only methods on unseen objects and and motions. https://github.com/baceituno/dlm.' volume: 164 URL: https://proceedings.mlr.press/v164/aceituno22a.html PDF: https://proceedings.mlr.press/v164/aceituno22a/aceituno22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-aceituno22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Bernardo family: Aceituno - given: Alberto family: Rodriguez - given: Shubham family: Tulsiani - given: Abhinav family: Gupta - given: Mustafa family: Mukadam editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 137-147 id: aceituno22a issued: date-parts: - 2022 - 1 - 11 firstpage: 137 lastpage: 147 published: 2022-01-11 00:00:00 +0000 - title: 'SORNet: Spatial Object-Centric Representations for Sequential Manipulation' abstract: 'Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.' volume: 164 URL: https://proceedings.mlr.press/v164/yuan22a.html PDF: https://proceedings.mlr.press/v164/yuan22a/yuan22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yuan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Wentao family: Yuan - given: Chris family: Paxton - given: Karthik family: Desingh - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 148-157 id: yuan22a issued: date-parts: - 2022 - 1 - 11 firstpage: 148 lastpage: 157 published: 2022-01-11 00:00:00 +0000 - title: 'Implicit Behavioral Cloning' abstract: 'We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavior-cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavior-cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision. ' volume: 164 URL: https://proceedings.mlr.press/v164/florence22a.html PDF: https://proceedings.mlr.press/v164/florence22a/florence22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-florence22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Pete family: Florence - given: Corey family: Lynch - given: Andy family: Zeng - given: Oscar A family: Ramirez - given: Ayzaan family: Wahid - given: Laura family: Downs - given: Adrian family: Wong - given: Johnny family: Lee - given: Igor family: Mordatch - given: Jonathan family: Tompson editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 158-168 id: florence22a issued: date-parts: - 2022 - 1 - 11 firstpage: 158 lastpage: 168 published: 2022-01-11 00:00:00 +0000 - title: 'Influencing Behavioral Attributions to Robot Motion During Task Execution' abstract: 'While prior work has shown how to autonomously generate motion that communicates task-related attributes, like intent or capability, we know less about how to automatically generate motion that communicates higher-level behavioral attributes such as curiosity or competence. We propose a framework that addresses the challenges of modeling human attributions to robot motion, generating trajectories that elicit attributions, and selecting trajectories that balance attribution and task completion. The insight underpinning our approach is that attributions can be ascribed to features of the motion that don’t severely impact task performance, and that these features form a convenient basis both for predicting and generating communicative motion. We illustrate the framework in a coverage task resembling household vacuum cleaning. Through a virtual interface, we collect a dataset of human attributions to robot trajectories during task execution and learn a probabilistic model that maps trajectories to attributions. We then incorporate this model into a trajectory generation mechanism that balances between task completion and communication of a desired behavioral attribute. Through an online user study on a different household layout, we find that our prediction model accurately captures human attribution for coverage tasks.' volume: 164 URL: https://proceedings.mlr.press/v164/walker22a.html PDF: https://proceedings.mlr.press/v164/walker22a/walker22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-walker22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Nick family: Walker - given: Christoforos family: Mavrogiannis - given: Siddhartha family: Srinivasa - given: Maya family: Cakmak editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 169-179 id: walker22a issued: date-parts: - 2022 - 1 - 11 firstpage: 169 lastpage: 179 published: 2022-01-11 00:00:00 +0000 - title: 'DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries' abstract: 'We introduce a framework for multi-camera 3D object detection. In contrast to existing works, which estimate 3D bounding boxes directly from monocular images or use depth prediction networks to generate input for 3D object detection from 2D information, our method manipulates predictions directly in 3D space. Our architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera transformation matrices. Finally, our model makes a bounding box prediction per object query, using a set-to-set loss to measure the discrepancy between the ground-truth and the prediction. This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model. Moreover, our method does not require post-processing such as non-maximum suppression, dramatically improving inference speed. We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22b.html PDF: https://proceedings.mlr.press/v164/wang22b/wang22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yue family: Wang - given: Vitor Campagnolo family: Guizilini - given: Tianyuan family: Zhang - given: Yilun family: Wang - given: Hang family: Zhao - given: Justin family: Solomon editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 180-191 id: wang22b issued: date-parts: - 2022 - 1 - 11 firstpage: 180 lastpage: 191 published: 2022-01-11 00:00:00 +0000 - title: 'FabricFlowNet: Bimanual Cloth Manipulation with a Flow-based Policy' abstract: 'We address the problem of goal-directed cloth manipulation, a challenging task due to the deformability of cloth. Our insight is that optical flow, a technique normally used for motion estimation in video, can also provide an effective representation for corresponding cloth poses across observation and goal images. We introduce FabricFlowNet (FFN), a cloth manipulation policy that leverages flow as both an input and as an action representation to improve performance. FabricFlowNet also elegantly switches between bimanual and single-arm actions based on the desired goal. We show that FabricFlowNet significantly outperforms state-of-the-art model-free and model-based cloth manipulation policies that take image input. We also present real-world experiments on a bimanual system, demonstrating effective sim-to-real transfer. Finally, we show that our method generalizes when trained on a single square cloth to other cloth shapes, such as T-shirts and rectangular cloths. Video and other supplementary materials are available at: https://sites.google.com/view/fabricflownet.' volume: 164 URL: https://proceedings.mlr.press/v164/weng22a.html PDF: https://proceedings.mlr.press/v164/weng22a/weng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-weng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Thomas family: Weng - given: Sujay Man family: Bajracharya - given: Yufei family: Wang - given: Khush family: Agrawal - given: David family: Held editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 192-202 id: weng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 192 lastpage: 202 published: 2022-01-11 00:00:00 +0000 - title: 'Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals' abstract: 'Accurately predicting the future motion of surrounding vehicles requires reasoning about the inherent uncertainty in driving behavior. This uncertainty can be loosely decoupled into lateral (e.g., keeping lane, turning) and longitudinal (e.g., accelerating, braking). We present a novel method that combines learned discrete policy rollouts with a focused decoder on subsets of the lane graph. The policy rollouts explore different goals given current observations, ensuring that the model captures lateral variability. Longitudinal variability is captured by our latent variable model decoder that is conditioned on various subsets of the lane graph. Our model achieves state-of-the-art performance on the nuScenes motion prediction dataset, and qualitatively demonstrates excellent scene compliance. Detailed ablations highlight the importance of the policy rollouts and the decoder architecture.' volume: 164 URL: https://proceedings.mlr.press/v164/deo22a.html PDF: https://proceedings.mlr.press/v164/deo22a/deo22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-deo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Nachiket family: Deo - given: Eric family: Wolff - given: Oscar family: Beijbom editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 203-212 id: deo22a issued: date-parts: - 2022 - 1 - 11 firstpage: 203 lastpage: 212 published: 2022-01-11 00:00:00 +0000 - title: 'Structured deep generative models for sampling on constraint manifolds in sequential manipulation' abstract: 'Sampling efficiently on constraint manifolds is a core problem in robotics. We propose Deep Generative Constraint Sampling (DGCS), which combines a deep generative model for sampling close to a constraint manifold with nonlinear constrained optimization to project to the constraint manifold. The generative model is conditioned on the problem instance, taking a scene image as input, and it is trained with a dataset of solutions and a novel analytic constraint term. To further improve the precision and diversity of samples, we extend the approach to exploit a factorization of the constrained problem. We evaluate our approach in two problems of robotic sequential manipulation in cluttered environments. Experimental results demonstrate that our deep generative model produces diverse and precise samples and outperforms heuristic warmstart initialization. ' volume: 164 URL: https://proceedings.mlr.press/v164/ortiz-haro22a.html PDF: https://proceedings.mlr.press/v164/ortiz-haro22a/ortiz-haro22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-ortiz-haro22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Joaquim family: Ortiz-Haro - given: Jung-Su family: Ha - given: Danny family: Driess - given: Marc family: Toussaint editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 213-223 id: ortiz-haro22a issued: date-parts: - 2022 - 1 - 11 firstpage: 213 lastpage: 223 published: 2022-01-11 00:00:00 +0000 - title: 'Rough Terrain Navigation Using Divergence Constrained Model-Based Reinforcement Learning' abstract: 'Autonomous navigation of wheeled robots in rough terrain environments has been a long standing challenge. In these environments, predicting the robot’s trajectory can be challenging due to the complexity of terrain interactions, as well as the divergent dynamics that cause model uncertainty to compound and propagate poorly. This inhibits the robot’s long horizon decision making capabilities and often lead to shortsighted navigation strategies. We propose a model-based reinforcement learning algorithm for rough terrain traversal that trains a probabilistic dynamics model to consider the propagating effects of uncertainty. During trajectory predictions, a trajectory tracking controller is considered to predict closed-loop trajectories. Our method further increases prediction accuracy and precision by using constrained optimization to find trajectories with low divergence. Using this method, wheeled robots can find non-myopic control strategies to reach destinations with higher probability of success. We show results on simulated and real world robots navigating through rough terrain environments.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22c.html PDF: https://proceedings.mlr.press/v164/wang22c/wang22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sean J family: Wang - given: Samuel family: Triest - given: Wenshan family: Wang - given: Sebastian family: Scherer - given: Aaron family: Johnson editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 224-233 id: wang22c issued: date-parts: - 2022 - 1 - 11 firstpage: 224 lastpage: 233 published: 2022-01-11 00:00:00 +0000 - title: 'LEO: Learning Energy-based Models in Factor Graph Optimization' abstract: 'We address the problem of learning observation models end-to-end for estimation. Robots operating in partially observable environments must infer latent states from multiple sensory inputs using observation models that capture the joint distribution between latent states and observations. This inference problem can be formulated as an objective over a graph that optimizes for the most likely sequence of states using all previous measurements. Prior work uses observation models that are either known a-priori or trained on surrogate losses independent of the graph optimizer. In this paper, we propose a method to directly optimize end-to-end tracking performance by learning observation models with the graph optimizer in the loop. This direct approach may appear, however, to require the inference algorithm to be fully differentiable, which many state-of-the-art graph optimizers are not. Our key insight is to instead formulate the problem as that of energy-based learning. We propose a novel approach, LEO, for learning observation models end-to-end with graph optimizers that may be non-differentiable. LEO alternates between sampling trajectories from the graph posterior and updating the model to match these samples to ground truth trajectories. We propose a way to generate such samples efficiently using incremental Gauss-Newton solvers. We compare LEO against baselines on datasets drawn from two distinct tasks: navigation and real-world planar pushing. We show that LEO is able to learn complex observation models with lower errors and fewer samples.' volume: 164 URL: https://proceedings.mlr.press/v164/sodhi22a.html PDF: https://proceedings.mlr.press/v164/sodhi22a/sodhi22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-sodhi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Paloma family: Sodhi - given: Eric family: Dexheimer - given: Mustafa family: Mukadam - given: Stuart family: Anderson - given: Michael family: Kaess editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 234-244 id: sodhi22a issued: date-parts: - 2022 - 1 - 11 firstpage: 234 lastpage: 244 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning' abstract: 'This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene. Most manipulation planning approaches rely on analytical models and carefully chosen abstractions/state-spaces to be effective. A central question is how models can be obtained from data that are not primarily accurate in their predictions, but, more importantly, enable efficient reasoning within a planning framework, while at the same time being closely coupled to perception spaces. We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations, but also that SDF-based models are suitable for optimization-based planning. To demonstrate the versatility of our approach, we learn both kinematic and dynamic models to solve tasks that involve hanging mugs on hooks and pushing objects on a table. We can unify these quite different tasks within one framework, since SDFs are the common object representation. Video: https://youtu.be/ga8Wlkss7co' volume: 164 URL: https://proceedings.mlr.press/v164/driess22a.html PDF: https://proceedings.mlr.press/v164/driess22a/driess22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-driess22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Danny family: Driess - given: Jung-Su family: Ha - given: Marc family: Toussaint - given: Russ family: Tedrake editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 245-255 id: driess22a issued: date-parts: - 2022 - 1 - 11 firstpage: 245 lastpage: 255 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Visible Connectivity Dynamics for Cloth Smoothing' abstract: 'Robotic manipulation of cloth remains challenging due to the complex dynamics of cloth, lack of a low-dimensional state representation, and self-occlusions. In contrast to previous model-based approaches that learn a pixel-based dynamics model or a compressed latent vector dynamics, we propose to learn a particle-based dynamics model from a partial point cloud observation. To overcome the challenges of partial observability, we infer which visible points are connected on the underlying cloth mesh. We then learn a dynamics model over this visible connectivity graph. Compared to previous learning-based approaches, our model poses strong inductive bias with its particle based representation for learning the underlying cloth physics; it can generalize to cloths with novel shapes; it is invariant to visual features; and the predictions can be more easily visualized. We show that our method greatly outperforms previous state-of-the-art model-based and model-free reinforcement learning methods in simulation. Furthermore, we demonstrate zero-shot sim-to-real transfer where we deploy the model trained in simulation on a Franka arm and show that the model can successfully smooth cloths of different materials, geometries and colors from crumpled configurations. Videos can be found in the supplement and on our anonymous project website.' volume: 164 URL: https://proceedings.mlr.press/v164/lin22a.html PDF: https://proceedings.mlr.press/v164/lin22a/lin22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xingyu family: Lin - given: Yufei family: Wang - given: Zixuan family: Huang - given: David family: Held editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 256-266 id: lin22a issued: date-parts: - 2022 - 1 - 11 firstpage: 256 lastpage: 266 published: 2022-01-11 00:00:00 +0000 - title: 'GRAC: Self-Guided and Self-Regularized Actor-Critic' abstract: ' Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor-critic algorithm. We evaluate GRAC on the OpenAI gym tasks, outperforming state of the art on four tasks and achieving competitive results on two environments. We also apply GRAC to enable a non-anthropomorphic robotic hand to successfully accomplish an in-hand manipulation task in the real world.' volume: 164 URL: https://proceedings.mlr.press/v164/shao22a.html PDF: https://proceedings.mlr.press/v164/shao22a/shao22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-shao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Lin family: Shao - given: Yifan family: You - given: Mengyuan family: Yan - given: Shenli family: Yuan - given: Qingyun family: Sun - given: Jeannette family: Bohg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 267-276 id: shao22a issued: date-parts: - 2022 - 1 - 11 firstpage: 267 lastpage: 276 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to Regrasp by Learning to Place' abstract: 'In this paper, we explore whether a robot can learn to regrasp a diverse set of objects to achieve various desired grasp poses. Regrasping is needed whenever a robot’s current grasp pose fails to perform desired manipulation tasks. Endowing robots with such an ability has applications in many domains such as manufacturing or domestic services. Yet, it is a challenging task due to the large diversity of geometry in everyday objects and the high dimensionality of the state and action space. In this paper, we propose a system for robots to take partial point clouds of an object and the supporting environment as inputs and output a sequence of pick-and-place operations to transform an initial object grasp pose to the desired object grasp poses. The key technique includes a neural stable placement predictor and a regrasp graph based solution through leveraging and changing the surrounding environment. We introduce a new and challenging synthetic dataset for learning and evaluating the proposed approach. We demonstrate the effectiveness of our proposed system with both simulator and real-world experiments. More videos and visualization examples are available on our project https://sites.google.com/view/regrasp.' volume: 164 URL: https://proceedings.mlr.press/v164/cheng22a.html PDF: https://proceedings.mlr.press/v164/cheng22a/cheng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-cheng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Shuo family: Cheng - given: Kaichun family: Mo - given: Lin family: Shao editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 277-286 id: cheng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 277 lastpage: 286 published: 2022-01-11 00:00:00 +0000 - title: 'V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects' abstract: 'Manipulating articulated objects requires multiple robot arms in general. It is challenging to enable multiple robot arms to collaboratively complete manipulation tasks on articulated objects. In this paper, we present V-MAO, a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm. The training signal is obtained from interaction with the simulation environment which is enabled by planning and a novel formulation of object-centric control for articulated objects. We deploy our framework in a customized MuJoCo simulation environment and demonstrate that our framework achieves a high success rate on six different objects and two different robots. We also show that generative modeling can effectively learn the contact point distribution on articulated objects.' volume: 164 URL: https://proceedings.mlr.press/v164/liu22a.html PDF: https://proceedings.mlr.press/v164/liu22a/liu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-liu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xingyu family: Liu - given: Kris M. family: Kitani editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 287-296 id: liu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 287 lastpage: 296 published: 2022-01-11 00:00:00 +0000 - title: 'A System for General In-Hand Object Re-Orientation' abstract: 'In-hand object reorientation has been a challenging problem in robotics due to high dimensional actuation space and the frequent change in contact state between the fingers and the objects. We present a simple model-free framework that can learn to reorient objects with both the hand facing upwards and downwards. We demonstrate the capability of reorienting over $2000$ geometrically different objects in both cases. The learned policies show strong zero-shot transfer performance on new objects. We provide evidence that these policies are amenable to real-world operation by distilling them to use observations easily available in the real world. The videos of the learned policies are available at: https://taochenshh.github.io/projects/in-hand-reorientation.' volume: 164 URL: https://proceedings.mlr.press/v164/chen22a.html PDF: https://proceedings.mlr.press/v164/chen22a/chen22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-chen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tao family: Chen - given: Jie family: Xu - given: Pulkit family: Agrawal editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 297-307 id: chen22a issued: date-parts: - 2022 - 1 - 11 firstpage: 297 lastpage: 307 published: 2022-01-11 00:00:00 +0000 - title: 'Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation' abstract: 'We study how robots can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large scale data collection and supervision. Our aim is to devise a robotic reinforcement learning system for learning navigation and manipulation together, in an autonomous way without human intervention, enabling continual learning under realistic assumptions. Specifically, our system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation, without human intervention, and without access to privileged information, such as maps, objects positions, or a global view of the environment. Our method employs a modularized policy with components for manipulation and navigation, where uncertainty over the manipulation success drives exploration for the navigation controller, and the manipulation module provides rewards for navigation. We evaluate our method on a room cleanup task, where the robot must navigate to and pick up items of scattered on the floor. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of real-world training.' volume: 164 URL: https://proceedings.mlr.press/v164/sun22a.html PDF: https://proceedings.mlr.press/v164/sun22a/sun22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-sun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Charles family: Sun - given: Jȩdrzej family: Orbik - given: Coline Manon family: Devin - given: Brian H. family: Yang - given: Abhishek family: Gupta - given: Glen family: Berseth - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 308-319 id: sun22a issued: date-parts: - 2022 - 1 - 11 firstpage: 308 lastpage: 319 published: 2022-01-11 00:00:00 +0000 - title: 'Adversarially Robust Imitation Learning' abstract: 'Modern imitation learning (IL) utilizes deep neural networks (DNNs) as function approximators to mimic the policy of the expert demonstrations. However, DNNs can be easily fooled by subtle noise added to the input, which is even non-detectable by humans. This makes the learned agent vulnerable to attacks, especially in IL where agents can struggle to recover from the errors. In such light, we propose a sound Adversarially Robust Imitation Learning (ARIL) method. In our setting, an agent and an adversary are trained alternatively. The former with adversarially attacked input at each timestep mimics the behavior of an online expert and the latter learns to add perturbations on the states by forcing the learned agent to fail on choosing the right decisions. We theoretically prove that ARIL can achieve adversarial robustness and evaluate ARIL on multiple benchmarks from DM Control Suite. The result reveals that our method (ARIL) achieves better robustness compare with other imitation learning methods under both sensory attack and physical attack.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22d.html PDF: https://proceedings.mlr.press/v164/wang22d/wang22d.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jianren family: Wang - given: Ziwen family: Zhuang - given: Yuyang family: Wang - given: Hang family: Zhao editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 320-331 id: wang22d issued: date-parts: - 2022 - 1 - 11 firstpage: 320 lastpage: 331 published: 2022-01-11 00:00:00 +0000 - title: 'Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention' abstract: 'Safety has become one of the main challenges of applying deep reinforcement learning to real world systems. Currently, the incorporation of external knowledge such as human oversight is the only means to prevent the agent from visiting the catastrophic state. In this paper, we propose MBHI, a novel framework for safe model-based reinforcement learning, which ensures safety in the state-level and can effectively avoid both local and non-local catastrophes. An ensemble of supervised learners are trained in MBHI to imitate human blocking decisions. Similar to human decision-making process, MBHI will roll out an imagined trajectory in the dynamics model before executing actions to the environment, and estimate its safety. When the imagination encounters a catastrophe, MBHI will block the current action and use an efficient MPC method to output a safety policy. We evaluate our method on several safety tasks, and the results show that MBHI achieved better performance in terms of sample efficiency and number of catastrophes compared to the baselines.' volume: 164 URL: https://proceedings.mlr.press/v164/xu22a.html PDF: https://proceedings.mlr.press/v164/xu22a/xu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-xu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yunkun family: Xu - given: Zhenyu family: Liu - given: Guifang family: Duan - given: Jiangcheng family: Zhu - given: Xiaolong family: Bai - given: Jianrong family: Tan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 332-341 id: xu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 332 lastpage: 341 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Multimodal Rewards from Rankings' abstract: 'Learning from human feedback has shown to be a useful approach in acquiring robot reward functions. However, expert feedback is often assumed to be drawn from an underlying unimodal reward function. This assumption does not always hold including in settings where multiple experts provide data or when a single expert provides data for different tasks—we thus go beyond learning a unimodal reward and focus on learning a multimodal reward function. We formulate the multimodal reward learning as a mixture learning problem and develop a novel ranking-based learning approach, where the experts are only required to rank a given set of trajectories. Furthermore, as access to interaction data is often expensive in robotics, we develop an active querying approach to accelerate the learning process. We conduct experiments and user studies using a multi-task variant of OpenAI’s LunarLander and a real Fetch robot, where we collect data from multiple users with different preferences. The results suggest that our approach can efficiently learn multimodal reward functions, and improve data-efficiency over benchmark methods that we adapt to our learning problem.' volume: 164 URL: https://proceedings.mlr.press/v164/myers22a.html PDF: https://proceedings.mlr.press/v164/myers22a/myers22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-myers22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Vivek family: Myers - given: Erdem family: Biyik - given: Nima family: Anari - given: Dorsa family: Sadigh editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 342-352 id: myers22a issued: date-parts: - 2022 - 1 - 11 firstpage: 342 lastpage: 352 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Reward Functions from Scale Feedback' abstract: 'Today’s robots are increasingly interacting with people and need to efficiently learn inexperienced user’s preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We introduce a probabilistic model on how users would provide feedback and derive a learning framework for the robot. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies suggesting that scale feedback enables more effective learning in practice.' volume: 164 URL: https://proceedings.mlr.press/v164/wilde22a.html PDF: https://proceedings.mlr.press/v164/wilde22a/wilde22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wilde22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Nils family: Wilde - given: Erdem family: Biyik - given: Dorsa family: Sadigh - given: Stephen L. family: Smith editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 353-362 id: wilde22a issued: date-parts: - 2022 - 1 - 11 firstpage: 353 lastpage: 362 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Feasibility to Imitate Demonstrators with Different Dynamics' abstract: 'The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations. Prior works on learning from demonstrations assume that the demonstrations are collected by a demonstrator that has the same dynamics as the imitator. However, in many real-world applications, this assumption is limiting — to improve the problem of lack of data in robotics, we would like to be able to leverage demonstrations collected from agents with different dynamics. This can be challenging as the demonstrations might not even be feasible for the imitator. Our insight is that we can learn a feasibility metric that captures the likelihood of a demonstration being feasible by the imitator. We develop a feasibility MDP (f-MDP) and derive the feasibility score by learning an optimal policy in the f-MDP. Our proposed feasibility measure encourages the imitator to learn from more informative demonstrations, and disregard the far from feasible demonstrations. Our experiments on four simulated environments and on a real robot show that the policy learned with our approach achieves a higher expected return than prior works. We show the videos of the real robot arm experiments on our website.' volume: 164 URL: https://proceedings.mlr.press/v164/cao22a.html PDF: https://proceedings.mlr.press/v164/cao22a/cao22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-cao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Zhangjie family: Cao - given: Yilun family: Hao - given: Mengxi family: Li - given: Dorsa family: Sadigh editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 363-372 id: cao22a issued: date-parts: - 2022 - 1 - 11 firstpage: 363 lastpage: 372 published: 2022-01-11 00:00:00 +0000 - title: 'Orientation Probabilistic Movement Primitives on Riemannian Manifolds' abstract: 'Learning complex robot motions necessarily demands to have models that are able to encode and retrieve full-pose trajectories when tasks are defined in operational spaces. Probabilistic movement primitives (ProMPs) stand out as a principled approach that models trajectory distributions learned from demonstrations. ProMPs allow for trajectory modulation and blending to achieve better generalization to novel situations. However, when ProMPs are employed in operational space, their original formulation does not directly apply to full-pose movements including rotational trajectories described by quaternions. This paper proposes a Riemannian formulation of ProMPs that enables encoding and retrieving of quaternion trajectories. Our method builds on Riemannian manifold theory, and exploits multilinear geodesic regression for estimating the ProMPs parameters. This novel approach makes ProMPs a suitable model for learning complex full-pose robot motion patterns. Riemannian ProMPs are tested on toy examples to illustrate their workflow, and on real learning-from-demonstration experiments. ' volume: 164 URL: https://proceedings.mlr.press/v164/rozo22a.html PDF: https://proceedings.mlr.press/v164/rozo22a/rozo22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-rozo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Leonel family: Rozo - given: Vedant family: Dave editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 373-383 id: rozo22a issued: date-parts: - 2022 - 1 - 11 firstpage: 373 lastpage: 383 published: 2022-01-11 00:00:00 +0000 - title: 'Self-supervised Reinforcement Learning with Independently Controllable Subgoals' abstract: 'To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects. ' volume: 164 URL: https://proceedings.mlr.press/v164/zadaianchuk22a.html PDF: https://proceedings.mlr.press/v164/zadaianchuk22a/zadaianchuk22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zadaianchuk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Andrii family: Zadaianchuk - given: Georg family: Martius - given: Fanny family: Yang editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 384-394 id: zadaianchuk22a issued: date-parts: - 2022 - 1 - 11 firstpage: 384 lastpage: 394 published: 2022-01-11 00:00:00 +0000 - title: 'Haptics-based Curiosity for Sparse-reward Tasks' abstract: 'Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary for tasks that involve contact-rich motion. In this work, we leverage surprise from mismatches in haptics feedback to guide exploration in hard sparse-reward reinforcement learning tasks. Our approach, Haptics-based Curiosity (HaC), learns what visible objects interactions are supposed to “feel" like. We encourage exploration by rewarding interactions where the expectation and the experience do not match. We test our approach on a range of haptics-intensive robot arm tasks (e.g. pushing objects, opening doors), which we also release as part of this work. Across multiple experiments in a simulated setting, we demonstrate that our method is able to learn these difficult tasks through sparse reward and curiosity alone. We compare our cross-modal approach to single-modality (haptics- or vision-only) approaches as well as other curiosity-based methods and find that our method performs better and is more sample-efficient.' volume: 164 URL: https://proceedings.mlr.press/v164/rajeswar22a.html PDF: https://proceedings.mlr.press/v164/rajeswar22a/rajeswar22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-rajeswar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sai family: Rajeswar - given: Cyril family: Ibrahim - given: Nitin family: Surya - given: Florian family: Golemo - given: David family: Vazquez - given: Aaron family: Courville - given: Pedro O. family: Pinheiro editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 395-405 id: rajeswar22a issued: date-parts: - 2022 - 1 - 11 firstpage: 395 lastpage: 405 published: 2022-01-11 00:00:00 +0000 - title: 'Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization' abstract: 'Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.' volume: 164 URL: https://proceedings.mlr.press/v164/lee22a.html PDF: https://proceedings.mlr.press/v164/lee22a/lee22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lee22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Youngwoon family: Lee - given: Joseph J family: Lim - given: Anima family: Anandkumar - given: Yuke family: Zhu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 406-416 id: lee22a issued: date-parts: - 2022 - 1 - 11 firstpage: 406 lastpage: 416 published: 2022-01-11 00:00:00 +0000 - title: 'A Workflow for Offline Model-Free Robotic Reinforcement Learning' abstract: 'Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model ar- architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional content can be found at https://sites.google.com/view/offline-rl-workflow. ' volume: 164 URL: https://proceedings.mlr.press/v164/kumar22a.html PDF: https://proceedings.mlr.press/v164/kumar22a/kumar22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kumar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Aviral family: Kumar - given: Anikait family: Singh - given: Stephen family: Tian - given: Chelsea family: Finn - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 417-428 id: kumar22a issued: date-parts: - 2022 - 1 - 11 firstpage: 417 lastpage: 428 published: 2022-01-11 00:00:00 +0000 - title: 'SeqMatchNet: Contrastive Learning with Sequence Matching for Place Recognition & Relocalization' abstract: 'Visual Place Recognition (VPR) for mobile robot global relocalization is a well-studied problem, where contrastive learning based representation training methods have led to state-of-the-art performance. However, these methods are mainly designed for single image based VPR, where sequential information, which is ubiquitous in robotics, is only used as a post-processing step for filtering single image match scores, but is never used to guide the representation learning process itself. In this work, for the first time, we bridge the gap between single image representation learning and sequence matching through "SeqMatchNet" which transforms the single image descriptors such that they become more responsive to the sequence matching metric. We propose a novel triplet loss formulation where the distance metric is based on "sequence matching", that is, the aggregation of temporal order-based Euclidean distances computed using single images. We use the same metric for mining negatives online during the training which helps the optimization process by selecting appropriate positives and harder negatives. To overcome the computational overhead of sequence matching for negative mining, we propose a 2D convolution based formulation of sequence matching for efficiently aggregating distances within a distance matrix computed using single images. We show that our proposed method achieves consistent gains in performance as demonstrated on four benchmark datasets. Source code available at https://github.com/oravus/SeqMatchNet.' volume: 164 URL: https://proceedings.mlr.press/v164/garg22a.html PDF: https://proceedings.mlr.press/v164/garg22a/garg22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-garg22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sourav family: Garg - given: Madhu family: Vankadari - given: Michael family: Milford editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 429-443 id: garg22a issued: date-parts: - 2022 - 1 - 11 firstpage: 429 lastpage: 443 published: 2022-01-11 00:00:00 +0000 - title: 'Risk-Averse Zero-Order Trajectory Optimization' abstract: 'We introduce a simple but effective method for managing risk in zero-order trajectory optimization that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks. Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.' volume: 164 URL: https://proceedings.mlr.press/v164/vlastelica22a.html PDF: https://proceedings.mlr.press/v164/vlastelica22a/vlastelica22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-vlastelica22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Marin family: Vlastelica - given: Sebastian family: Blaes - given: Cristina family: Pinneri - given: Georg family: Martius editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 444-454 id: vlastelica22a issued: date-parts: - 2022 - 1 - 11 firstpage: 444 lastpage: 454 published: 2022-01-11 00:00:00 +0000 - title: 'iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks' abstract: 'Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of household tasks through three key innovations. First, iGibson 2.0 supports object states, including temperature, wetness level, cleanliness level, and toggled and sliced states, necessary to cover a wider range of tasks. Second, iGibson 2.0 implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked. Additionally, given a logic state, iGibson 2.0 can sample valid physical states that satisfy it. This functionality can generate potentially infinite instances of tasks with minimal effort from the users. The sampling mechanism allows our scenes to be more densely populated with small objects in semantically meaningful locations. Third, iGibson 2.0 includes a virtual reality (VR) interface to immerse humans in its scenes to collect demonstrations. As a result, we can collect demonstrations from humans on these new types of tasks, and use them for imitation learning. We evaluate the new capabilities of iGibson 2.0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI. iGibson 2.0 and its new dataset are publicly available at http://svl.stanford.edu/igibson/.' volume: 164 URL: https://proceedings.mlr.press/v164/li22b.html PDF: https://proceedings.mlr.press/v164/li22b/li22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-li22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chengshu family: Li - given: Fei family: Xia - given: Roberto family: Martín-Martín - given: Michael family: Lingelbach - given: Sanjana family: Srivastava - given: Bokui family: Shen - given: Kent Elliott family: Vainio - given: Cem family: Gokmen - given: Gokul family: Dharan - given: Tanish family: Jain - given: Andrey family: Kurenkov - given: Karen family: Liu - given: Hyowon family: Gweon - given: Jiajun family: Wu - given: Li family: Fei-Fei - given: Silvio family: Savarese editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 455-465 id: li22b issued: date-parts: - 2022 - 1 - 11 firstpage: 455 lastpage: 465 published: 2022-01-11 00:00:00 +0000 - title: 'ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations' abstract: 'Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years. However, the progress in these directions is limited by the small set of objects available—synthetic objects are not realistic enough and are mostly centered around geometry, while real object datasets such as YCB are often practically challenging and unstable to acquire due to international shipping, inventory, and financial cost. We present ObjectFolder, a dataset of 100 virtualized objects that addresses both challenges with two key innovations. First, ObjectFolder encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks, beyond existing datasets that focus purely on object geometry. Second, ObjectFolder employs a uniform, object-centric, and implicit representation for each object’s visual textures, acoustic simulations, and tactile readings, making the dataset flexible to use and easy to share. We demonstrate the usefulness of our dataset as a testbed for multisensory perception and control by evaluating it on a variety of benchmark tasks, including instance recognition, cross-sensory retrieval, 3D reconstruction, and robotic grasping.' volume: 164 URL: https://proceedings.mlr.press/v164/gao22a.html PDF: https://proceedings.mlr.press/v164/gao22a/gao22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-gao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ruohan family: Gao - given: Yen-Yu family: Chang - given: Shivani family: Mall - given: Li family: Fei-Fei - given: Jiajun family: Wu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 466-476 id: gao22a issued: date-parts: - 2022 - 1 - 11 firstpage: 466 lastpage: 476 published: 2022-01-11 00:00:00 +0000 - title: 'BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments' abstract: 'We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for each activity: definition (it can differ by time, place, or person), instantiation in a simulator, and evaluation. BEHAVIOR addresses these with three innovations. First, we propose a predicate logic-based description language for expressing an activity’s initial and goal conditions, enabling generation of diverse instances for any activity. Second, we identify the simulator-agnostic features required by an underlying environment to support BEHAVIOR, and demonstrate in one such simulator. Third, we introduce a set of metrics to measure task progress and efficiency, absolute and relative to human demonstrators. We include 500 human demonstrations in virtual reality (VR) to serve as the human ground truth. Our experiments demonstrate that even state-of-the-art embodied AI solutions struggle with the level of realism, diversity, and complexity imposed by the activities in our benchmark. We make BEHAVIOR publicly available at behavior.stanford.edu to facilitate and calibrate the development of new embodied AI solutions.' volume: 164 URL: https://proceedings.mlr.press/v164/srivastava22a.html PDF: https://proceedings.mlr.press/v164/srivastava22a/srivastava22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-srivastava22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sanjana family: Srivastava - given: Chengshu family: Li - given: Michael family: Lingelbach - given: Roberto family: Martín-Martín - given: Fei family: Xia - given: Kent Elliott family: Vainio - given: Zheng family: Lian - given: Cem family: Gokmen - given: Shyamal family: Buch - given: Karen family: Liu - given: Silvio family: Savarese - given: Hyowon family: Gweon - given: Jiajun family: Wu - given: Li family: Fei-Fei editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 477-490 id: srivastava22a issued: date-parts: - 2022 - 1 - 11 firstpage: 477 lastpage: 490 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Model Preconditions for Planning with Multiple Models' abstract: 'Different models can provide differing levels of fidelity when a robot is planning. Analytical models are often fast to evaluate but only work in limited ranges of conditions. Meanwhile, physics simulators are effective at modeling complex interactions between objects but are typically more computationally expensive. Learning when to switch between the various models can greatly improve the speed of planning and task success reliability. In this work, we learn model deviation estimators (MDEs) to predict the error between real-world states and the states outputted by transition models. MDEs can be used to define a model precondition that describes which transitions are accurately modeled. We then propose a planner that uses the learned model preconditions to switch between various models in order to use models in conditions where they are accurate, prioritizing faster models when possible. We evaluate our method on two real-world tasks: placing a rod into a box and placing a rod into a closed drawer.' volume: 164 URL: https://proceedings.mlr.press/v164/lagrassa22a.html PDF: https://proceedings.mlr.press/v164/lagrassa22a/lagrassa22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lagrassa22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Alex Licari family: LaGrassa - given: Oliver family: Kroemer editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 491-500 id: lagrassa22a issued: date-parts: - 2022 - 1 - 11 firstpage: 491 lastpage: 500 published: 2022-01-11 00:00:00 +0000 - title: 'Single-Shot Scene Reconstruction' abstract: 'We introduce a novel scene reconstruction method to infer a fully editable and re-renderable model of a 3D road scene from a single image. We represent movable objects separately from the immovable background, and recover a full 3D model of each distinct object as well as their spatial relations in the scene. We leverage transformer-based detectors and neural implicit 3D representations and we build a Scene Decomposition Network (SDN) that reconstructs the scene in 3D. Furthermore, we show that this reconstruction can be used in an analysis-by-synthesis setting via differentiable rendering. Trained only on simulated road scenes, our method generalizes well to real data in the same class without any adaptation thanks to its strong inductive priors. Experiments on two synthetic-real dataset pairs (PD-DDAD and VKITTI-KITTI) show that our method can robustly recover scene geometry and appearance, as well as reconstruct and re-render the scene from novel viewpoints.' volume: 164 URL: https://proceedings.mlr.press/v164/zakharov22a.html PDF: https://proceedings.mlr.press/v164/zakharov22a/zakharov22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zakharov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sergey family: Zakharov - given: Rares Andrei family: Ambrus - given: Vitor Campagnolo family: Guizilini - given: Dennis family: Park - given: Wadim family: Kehl - given: Fredo family: Durand - given: Joshua B. family: Tenenbaum - given: Vincent family: Sitzmann - given: Jiajun family: Wu - given: Adrien family: Gaidon editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 501-512 id: zakharov22a issued: date-parts: - 2022 - 1 - 11 firstpage: 501 lastpage: 512 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Backchanneling Behaviors for a Social Robot via Data Augmentation from Human-Human Conversations' abstract: 'Backchanneling behaviors on a robot, such as nodding, can make talking to a robot feel more natural and engaging by giving a sense that the robot is actively listening. For backchanneling to be effective, it is important that the timing of such cues is appropriate given the humans’ conversational behaviors. Recent progress has shown that these behaviors can be learned from datasets of human-human conversations. However, recent data-driven methods tend to overfit to the human speakers that are seen in training data and fail to generalize well to previously unseen speakers. In this paper, we explore the use of data augmentation for effective nodding behavior in a robot. We show that, by augmenting the input speech and visual features, we can produce data-driven models that are more robust to unseen features without collecting additional data. We analyze the efficacy of data-driven backchanneling in a realistic human-robot conversational setting with a user study, showing that users perceived the data-driven model to be better at listening as compared to rule-based and random baselines.' volume: 164 URL: https://proceedings.mlr.press/v164/murray22a.html PDF: https://proceedings.mlr.press/v164/murray22a/murray22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-murray22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Michael family: Murray - given: Nick family: Walker - given: Amal family: Nanavati - given: Patricia family: Alves-Oliveira - given: Nikita family: Filippov - given: Allison family: Sauppe - given: Bilge family: Mutlu - given: Maya family: Cakmak editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 513-525 id: murray22a issued: date-parts: - 2022 - 1 - 11 firstpage: 513 lastpage: 525 published: 2022-01-11 00:00:00 +0000 - title: 'Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects' abstract: 'The ability to grasp and manipulate transparent objects is a major challenge for robots. Existing depth cameras have difficulty detecting, localizing, and inferring the geometry of such objects. We propose using neural radiance fields (NeRF) to detect, localize, and infer the geometry of transparent objects with sufficient accuracy to find and grasp them securely. We leverage NeRF’s view-independent learned density, place lights to increase specular reflections, and perform a transparency-aware depth-rendering that we feed into the Dex-Net grasp planner. We show how additional lights create specular reflections that improve the quality of the depth map, and test a setup for a robot workcell equipped with an array of cameras to perform transparent object manipulation. We also create synthetic and real datasets of transparent objects in real-world settings, including singulated objects, cluttered tables, and the top rack of a dishwasher. In each setting we show that NeRF and Dex-Net are able to reliably compute robust grasps on transparent objects, achieving 90% and 100% grasp-success rates in physical experiments on an ABB YuMi, on objects where baseline methods fail.' volume: 164 URL: https://proceedings.mlr.press/v164/ichnowski22a.html PDF: https://proceedings.mlr.press/v164/ichnowski22a/ichnowski22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-ichnowski22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jeffrey family: Ichnowski - given: Yahav family: Avigal - given: Justin family: Kerr - given: Ken family: Goldberg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 526-536 id: ichnowski22a issued: date-parts: - 2022 - 1 - 11 firstpage: 526 lastpage: 536 published: 2022-01-11 00:00:00 +0000 - title: 'XIRL: Cross-embodiment Inverse Reinforcement Learning' abstract: 'We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments – shape, actions, end-effector dynamics, etc. In this work, we demonstrate that it is possible to automatically discover and learn vision-based reward functions from cross-embodiment demonstration videos that are robust to these differences. Specifically, we present a self-supervised method for Cross-embodiment Inverse Reinforcement Learning (XIRL) that leverages temporal cycle-consistency constraints to learn deep visual embeddings that capture task progression from offline videos of demonstrations across multiple expert agents, each performing the same task differently due to embodiment differences. Prior to our work, producing rewards from self-supervised embeddings typically required alignment with a reference trajectory, which may be difficult to acquire under stark embodiment differences. We show empirically that if the embeddings are aware of task-progress, simply taking the negative distance between the current state and goal state in the learned embedding space is useful as a reward for training policies with reinforcement learning. We find our learned reward function not only works for embodiments seen during training, but also generalizes to entirely new embodiments. Additionally, when transferring real-world human demonstrations to a simulated robot, we find that XIRL is more sample efficient than current best methods.Qualitative results, code, and datasets are available at https://x-irl.github.io ' volume: 164 URL: https://proceedings.mlr.press/v164/zakka22a.html PDF: https://proceedings.mlr.press/v164/zakka22a/zakka22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zakka22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Kevin family: Zakka - given: Andy family: Zeng - given: Pete family: Florence - given: Jonathan family: Tompson - given: Jeannette family: Bohg - given: Debidatta family: Dwibedi editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 537-546 id: zakka22a issued: date-parts: - 2022 - 1 - 11 firstpage: 537 lastpage: 546 published: 2022-01-11 00:00:00 +0000 - title: 'Evaluations of the Gap between Supervised and Reinforcement Lifelong Learning on Robotic Manipulation Tasks' abstract: 'Overcoming catastrophic forgetting is of great importance for deep learning and robotics. Recent lifelong learning research has great advances in supervised learning. However, little work focuses on reinforcement learning(RL). We focus on evaluating the performances of state-of-the-art lifelong learning algorithms on robotic reinforcement learning tasks. We mainly focus on the properties of overcoming catastrophic forgetting for these algorithms. We summarize the pros and cons for each category of lifelong learning algorithms when applied in RL scenarios. We propose a framework to modify supervised lifelong learning algorithms to be compatible with RL. We also develop a manipulation benchmark task set for our evaluations.' volume: 164 URL: https://proceedings.mlr.press/v164/yang22a.html PDF: https://proceedings.mlr.press/v164/yang22a/yang22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Fan family: Yang - given: Chao family: Yang - given: Huaping family: Liu - given: Fuchun family: Sun editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 547-556 id: yang22a issued: date-parts: - 2022 - 1 - 11 firstpage: 547 lastpage: 556 published: 2022-01-11 00:00:00 +0000 - title: 'Scaling Up Multi-Task Robotic Reinforcement Learning' abstract: 'General-purpose robotic systems must master a large repertoire of diverse skills. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework, new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multi-task deep reinforcement learning method, which we call MT-Opt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at https://karolhausman.github.io/mt-opt/.' volume: 164 URL: https://proceedings.mlr.press/v164/kalashnikov22a.html PDF: https://proceedings.mlr.press/v164/kalashnikov22a/kalashnikov22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kalashnikov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Dmitry family: Kalashnikov - given: Jake family: Varley - given: Yevgen family: Chebotar - given: Benjamin family: Swanson - given: Rico family: Jonschkowski - given: Chelsea family: Finn - given: Sergey family: Levine - given: Karol family: Hausman editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 557-575 id: kalashnikov22a issued: date-parts: - 2022 - 1 - 11 firstpage: 557 lastpage: 575 published: 2022-01-11 00:00:00 +0000 - title: 'Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning' abstract: 'We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors. Video demonstrations and source code are available at the project website https://sites.google.com/view/swarm-rl.' volume: 164 URL: https://proceedings.mlr.press/v164/batra22a.html PDF: https://proceedings.mlr.press/v164/batra22a/batra22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-batra22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sumeet family: Batra - given: Zhehui family: Huang - given: Aleksei family: Petrenko - given: Tushar family: Kumar - given: Artem family: Molchanov - given: Gaurav S. family: Sukhatme editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 576-586 id: batra22a issued: date-parts: - 2022 - 1 - 11 firstpage: 576 lastpage: 586 published: 2022-01-11 00:00:00 +0000 - title: 'ReSkin: versatile, replaceable, lasting tactile skins' abstract: 'Soft sensors have continued growing interest in robotics, due to their ability to enable both passive conformal contact from the material properties and active contact data from the sensor properties. However, the same properties of conformal contact result in faster deterioration of soft sensors and larger variations in their response characteristics over time and across samples, inhibiting their ability to be long-lasting and replaceable. ReSkin is a tactile soft sensor that leverages machine learning and magnetic sensing to offer a low-cost, diverse and compact solution for long-term use. Magnetic sensing separates the electronic circuitry from the passive interface, making it easier to replace interfaces as they wear out while allowing for a wide variety of form factors. Machine learning allows us to learn sensor response models that are robust to variations across fabrication and time, and our self-supervised learning algorithm enables finer performance enhancement with small, inexpensive data collection procedures. We believe that ReSkin opens the door to more versatile, scalable and inexpensive tactile sensation modules than existing alternatives. ' volume: 164 URL: https://proceedings.mlr.press/v164/bhirangi22a.html PDF: https://proceedings.mlr.press/v164/bhirangi22a/bhirangi22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-bhirangi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Raunaq family: Bhirangi - given: Tess family: Hellebrekers - given: Carmel family: Majidi - given: Abhinav family: Gupta editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 587-597 id: bhirangi22a issued: date-parts: - 2022 - 1 - 11 firstpage: 587 lastpage: 597 published: 2022-01-11 00:00:00 +0000 - title: 'ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning' abstract: 'Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor? This paper presents ThriftyDAgger, an algorithm for actively querying a human supervisor given a desired budget of human interventions. ThriftyDAgger uses a learned switching policy to solicit interventions only at states that are sufficiently (1) novel, where the robot policy has no reference behavior to imitate, or (2) risky, where the robot has low confidence in task completion. To detect the latter, we introduce a novel metric for estimating risk under the current robot policy. Experiments in simulation and on a physical cable routing experiment suggest that ThriftyDAgger’s intervention criteria balances task performance and supervisor burden more effectively than prior algorithms. ThriftyDAgger can also be applied at execution time, where it achieves a 100% success rate on both the simulation and physical tasks. A user study (N=10) in which users control a three-robot fleet while also performing a concentration task suggests that ThriftyDAgger increases human and robot performance by 58% and 80% respectively compared to the next best algorithm while reducing supervisor burden. See https://tinyurl.com/thrifty-dagger for supplementary material.' volume: 164 URL: https://proceedings.mlr.press/v164/hoque22a.html PDF: https://proceedings.mlr.press/v164/hoque22a/hoque22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-hoque22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ryan family: Hoque - given: Ashwin family: Balakrishna - given: Ellen family: Novoseller - given: Albert family: Wilcox - given: Daniel S. family: Brown - given: Ken family: Goldberg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 598-608 id: hoque22a issued: date-parts: - 2022 - 1 - 11 firstpage: 598 lastpage: 608 published: 2022-01-11 00:00:00 +0000 - title: 'Anytime Depth Estimation with Limited Sensing and Computation Capabilities on Mobile Devices' abstract: 'Depth estimation is a safety critical and energy sensitive method for environment sensing. However, in real applications, the depth estimation may be halted at any time, due to the random interruptions or low energy capacity of battery when using powerful sensors like 3D LiDAR. To address this problem, we propose a depth estimation method that is robust to random halts and relies on energy-saving 2D LiDAR and a monocular camera. To this end, we formulate the depth estimation as an anytime problem and propose a new metric to evaluate its robustness under random interruptions. Our final model has only 2M parameters with a marginal accuracy loss compared to state-of-the-art baselines. Indeed, our experiments on NYU Depth v2 dataset show that our model is capable of processing 224$\times$224 resolution images and 2D point clouds with any computation budget larger than 6.37ms (157 FPS) and 0.2J on an NVIDIA Jetson TX2 system. Evaluations on KITTI dataset under supervised and self-supervised training show similar results.' volume: 164 URL: https://proceedings.mlr.press/v164/yang22b.html PDF: https://proceedings.mlr.press/v164/yang22b/yang22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yuedong family: Yang - given: Zihui family: Xue - given: Radu family: Marculescu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 609-618 id: yang22b issued: date-parts: - 2022 - 1 - 11 firstpage: 609 lastpage: 618 published: 2022-01-11 00:00:00 +0000 - title: 'Semantic Terrain Classification for Off-Road Autonomous Driving' abstract: 'Producing dense and accurate traversability maps is crucial for autonomous off-road navigation. In this paper, we focus on the problem of classifying terrains into 4 cost classes (free, low-cost, medium-cost, obstacle) for traversability assessment. This requires a robot to reason about both semantics (what objects are present?) and geometric properties (where are the objects located?) of the environment. To achieve this goal, we develop a novel Bird’s Eye View Network (BEVNet), a deep neural network that directly predicts a local map encoding terrain classes from sparse LiDAR inputs. BEVNet processes both geometric and semantic information in a temporally consistent fashion. More importantly, it uses learned prior and history to predict terrain classes in unseen space and into the future, allowing a robot to better appraise its situation. We quantitatively evaluate BEVNet on both on-road and off-road scenarios and show that it outperforms a variety of strong baselines.' volume: 164 URL: https://proceedings.mlr.press/v164/shaban22a.html PDF: https://proceedings.mlr.press/v164/shaban22a/shaban22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-shaban22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Amirreza family: Shaban - given: Xiangyun family: Meng - given: JoonHo family: Lee - given: Byron family: Boots - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 619-629 id: shaban22a issued: date-parts: - 2022 - 1 - 11 firstpage: 619 lastpage: 629 published: 2022-01-11 00:00:00 +0000 - title: 'Guided Imitation of Task and Motion Planning' abstract: 'While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver’s output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).' volume: 164 URL: https://proceedings.mlr.press/v164/mcdonald22a.html PDF: https://proceedings.mlr.press/v164/mcdonald22a/mcdonald22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mcdonald22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Michael James family: McDonald - given: Dylan family: Hadfield-Menell editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 630-640 id: mcdonald22a issued: date-parts: - 2022 - 1 - 11 firstpage: 630 lastpage: 640 published: 2022-01-11 00:00:00 +0000 - title: 'Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation' abstract: 'Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd.' volume: 164 URL: https://proceedings.mlr.press/v164/liu22b.html PDF: https://proceedings.mlr.press/v164/liu22b/liu22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-liu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: I-Chun Arthur family: Liu - given: Shagun family: Uppal - given: Gaurav S. family: Sukhatme - given: Joseph J family: Lim - given: Peter family: Englert - given: Youngwoon family: Lee editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 641-650 id: liu22b issued: date-parts: - 2022 - 1 - 11 firstpage: 641 lastpage: 650 published: 2022-01-11 00:00:00 +0000 - title: 'DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video' abstract: 'Dexterous multi-fingered robotic hands have a formidable action space, yet their morphological similarity to the human hand holds immense potential to accelerate robot learning. We propose DexVIP, an approach to learn dexterous robotic grasping from human-object interactions present in in-the-wild YouTube videos. We do this by curating grasp images from human-object interaction videos and imposing a prior over the agent’s hand pose when learning to grasp with deep reinforcement learning. A key advantage of our method is that the learned policy is able to leverage free-form in-the-wild visual data. As a result, it can easily scale to new objects, and it sidesteps the standard practice of collecting human demonstrations in a lab—a much more expensive and indirect way to capture human expertise. Through experiments on 27 objects with a 30-DoF simulated robot hand, we demonstrate that DexVIP compares favorably to existing approaches that lack a hand pose prior or rely on specialized tele-operation equipment to obtain human demonstrations, while also being faster to train.' volume: 164 URL: https://proceedings.mlr.press/v164/mandikal22a.html PDF: https://proceedings.mlr.press/v164/mandikal22a/mandikal22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mandikal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Priyanka family: Mandikal - given: Kristen family: Grauman editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 651-661 id: mandikal22a issued: date-parts: - 2022 - 1 - 11 firstpage: 651 lastpage: 661 published: 2022-01-11 00:00:00 +0000 - title: 'DiffImpact: Differentiable Rendering and Identification of Impact Sounds' abstract: 'Rigid objects make distinctive sounds during manipulation. These sounds are a function of object features, such as shape and material, and of contact forces during manipulation. Being able to infer from sound an object’s acoustic properties, how it is being manipulated, and what events it is participating in could augment and complement what robots can perceive from vision, especially in case of occlusion, low visual resolution, poor lighting, or blurred focus. Annotations on sound data are rare. Therefore, existing inference systems mostly include a sound renderer in the loop, and use analysis-by-synthesis to optimize for object acoustic properties. Optimizing parameters with respect to a non-differentiable renderer is slow and hard to scale to complex scenes. We present DiffImpact, a fully differentiable model for sounds rigid objects make during impacts, based on physical principles of impact forces, rigid object vibration, and other acoustic effects. Its differentiability enables gradient-based, efficient joint inference of acoustic properties of the objects and characteristics and timings of each individual impact. DiffImpact can also be plugged in as the decoder of an autoencoder, and trained end-to-end on real audio data, so that the encoder can learn to solve the inverse problem in a self-supervised way. Experiments demonstrate that our model’s physics-based inductive biases make it more resource efficient and expressive than state-of-the-art pure learning-based alternatives, on both forward rendering of impact sounds and inverse tasks such as acoustic property inference and blind source separation of impact sounds. Code and videos are at https://sites.google.com/view/diffimpact.' volume: 164 URL: https://proceedings.mlr.press/v164/clarke22a.html PDF: https://proceedings.mlr.press/v164/clarke22a/clarke22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-clarke22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Samuel family: Clarke - given: Negin family: Heravi - given: Mark family: Rau - given: Ruohan family: Gao - given: Jiajun family: Wu - given: Doug family: James - given: Jeannette family: Bohg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 662-673 id: clarke22a issued: date-parts: - 2022 - 1 - 11 firstpage: 662 lastpage: 673 published: 2022-01-11 00:00:00 +0000 - title: 'Rapid Exploration for Open-World Navigation with Latent Goal Models' abstract: 'We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration. Trained on a large offline dataset of prior experience, the model acquires a representation of visual goals that is robust to task-irrelevant distractors. We demonstrate our method on a mobile ground robot in open-world exploration scenarios. Given an image of a goal that is up to 80 meters away, our method leverages its representation to explore and discover the goal in under 20 minutes, even amidst previously-unseen obstacles and weather conditions.' volume: 164 URL: https://proceedings.mlr.press/v164/shah22a.html PDF: https://proceedings.mlr.press/v164/shah22a/shah22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-shah22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Dhruv family: Shah - given: Benjamin family: Eysenbach - given: Nicholas family: Rhinehart - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 674-684 id: shah22a issued: date-parts: - 2022 - 1 - 11 firstpage: 674 lastpage: 684 published: 2022-01-11 00:00:00 +0000 - title: 'Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR' abstract: 'Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose FusionDepth, a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard. Code is available at https://github.com/AutoAILab/FusionDepth' volume: 164 URL: https://proceedings.mlr.press/v164/feng22a.html PDF: https://proceedings.mlr.press/v164/feng22a/feng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-feng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ziyue family: Feng - given: Longlong family: Jing - given: Peng family: Yin - given: Yingli family: Tian - given: Bing family: Li editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 685-694 id: feng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 685 lastpage: 694 published: 2022-01-11 00:00:00 +0000 - title: 'Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views' abstract: 'We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.' volume: 164 URL: https://proceedings.mlr.press/v164/yang22c.html PDF: https://proceedings.mlr.press/v164/yang22c/yang22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jingyun family: Yang - given: Hsiao-Yu family: Tung - given: Yunchu family: Zhang - given: Gaurav family: Pathak - given: Ashwini family: Pokle - given: Christopher G family: Atkeson - given: Katerina family: Fragkiadaki editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 695-705 id: yang22c issued: date-parts: - 2022 - 1 - 11 firstpage: 695 lastpage: 705 published: 2022-01-11 00:00:00 +0000 - title: 'A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution' abstract: 'Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through several layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent representations. We propose a persistent spatial semantic representation method, and show how it enables building an agent that performs hierarchical reasoning to effectively execute long-term tasks. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions. https://hlsm-alfred.github.io/' volume: 164 URL: https://proceedings.mlr.press/v164/blukis22a.html PDF: https://proceedings.mlr.press/v164/blukis22a/blukis22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-blukis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Valts family: Blukis - given: Chris family: Paxton - given: Dieter family: Fox - given: Animesh family: Garg - given: Yoav family: Artzi editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 706-717 id: blukis22a issued: date-parts: - 2022 - 1 - 11 firstpage: 706 lastpage: 717 published: 2022-01-11 00:00:00 +0000 - title: 'Urban Driver: Learning to Drive from Real-world Demonstrations Using Policy Gradients' abstract: 'In this work we are the first to present an offline policy gradient method for learning imitative policies for complex urban driving from a large corpus of real-world demonstrations. This is achieved by building a differentiable data-driven simulator on top of perception outputs and high-fidelity HD maps of the area. It allows us to synthesize new driving experiences from existing demonstrations using mid-level representations. Using this simulator we then train a policy network in closed-loop employing policy gradients. We train our proposed method on 100 hours of expert demonstrations on urban roads and show that it learns complex driving policies that generalize well and can perform a variety of driving maneuvers. We demonstrate this in simulation as well as deploy our model to self-driving vehicles in the real-world. Our method outperforms previously demonstrated state-of-the-art for urban driving scenarios - all this without the need for complex state perturbations or collecting additional on-policy data during training. We make code and data publicly available.' volume: 164 URL: https://proceedings.mlr.press/v164/scheel22a.html PDF: https://proceedings.mlr.press/v164/scheel22a/scheel22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-scheel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Oliver family: Scheel - given: Luca family: Bergamini - given: Maciej family: Wolczyk - given: Błażej family: Osiński - given: Peter family: Ondruska editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 718-728 id: scheel22a issued: date-parts: - 2022 - 1 - 11 firstpage: 718 lastpage: 728 published: 2022-01-11 00:00:00 +0000 - title: 'Demonstration-Guided Reinforcement Learning with Learned Skills' abstract: 'Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator’s exact muscle movements. Naturally, such learning will be slow, but often new behaviors are not completely unseen: they share subtasks with behaviors we have previously learned. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We first learn a set of reusable skills from large offline datasets of prior experience collected across many tasks. We then propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations by following the demonstrated skills instead of the primitive actions, resulting in substantial performance improvements over prior demonstration-guided RL approaches. We validate the effectiveness of our approach on long-horizon maze navigation and complex robot manipulation tasks.' volume: 164 URL: https://proceedings.mlr.press/v164/pertsch22a.html PDF: https://proceedings.mlr.press/v164/pertsch22a/pertsch22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-pertsch22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Karl family: Pertsch - given: Youngwoon family: Lee - given: Yue family: Wu - given: Joseph J family: Lim editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 729-739 id: pertsch22a issued: date-parts: - 2022 - 1 - 11 firstpage: 729 lastpage: 739 published: 2022-01-11 00:00:00 +0000 - title: 'My House, My Rules: Learning Tidying Preferences with Graph Neural Networks' abstract: 'Robots that arrange household objects should do so according to the user’s preferences, which are inherently subjective and difficult to model. We present NeatNet: a novel Variational Autoencoder architecture using Graph Neural Network layers, which can extract a low-dimensional latent preference vector from a user by observing how they arrange scenes. Given any set of objects, this vector can then be used to generate an arrangement which is tailored to that user’s spatial preferences, with word embeddings used for generalisation to new objects. We develop a tidying simulator to gather rearrangement examples from 75 users, and demonstrate empirically that our method consistently produces neat and personalised arrangements across a variety of rearrangement scenarios.' volume: 164 URL: https://proceedings.mlr.press/v164/kapelyukh22a.html PDF: https://proceedings.mlr.press/v164/kapelyukh22a/kapelyukh22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kapelyukh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ivan family: Kapelyukh - given: Edward family: Johns editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 740-749 id: kapelyukh22a issued: date-parts: - 2022 - 1 - 11 firstpage: 740 lastpage: 749 published: 2022-01-11 00:00:00 +0000 - title: 'STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation' abstract: 'Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a system for fast, joint space sampling-based MPC for manipulators that is efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 8ms (125Hz) to compute the next control command. Further, our method can tightly integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of dynamic manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. We also provide highly optimized, open-source code to be used by the wider robot learning and control community. Videos of experiments can be found at: https://sites.google.com/view/manipulation-mpc' volume: 164 URL: https://proceedings.mlr.press/v164/bhardwaj22a.html PDF: https://proceedings.mlr.press/v164/bhardwaj22a/bhardwaj22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-bhardwaj22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Mohak family: Bhardwaj - given: Balakumar family: Sundaralingam - given: Arsalan family: Mousavian - given: Nathan D. family: Ratliff - given: Dieter family: Fox - given: Fabio family: Ramos - given: Byron family: Boots editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 750-759 id: bhardwaj22a issued: date-parts: - 2022 - 1 - 11 firstpage: 750 lastpage: 759 published: 2022-01-11 00:00:00 +0000 - title: 'Structure from Silence: Learning Scene Structure from Ambient Sound' abstract: 'From whirling ceiling fans to ticking clocks, the sounds that we hear subtly vary as we move through a scene. We ask whether these ambient sounds convey information about 3D scene structure and, if so, whether they provide a useful learning signal for multimodal models. To study this, we collect a dataset of paired audio and RGB-D recordings from a variety of quiet indoor scenes. We then train models that estimate the distance to nearby walls, given only audio as input. We also use these recordings to learn multimodal representations through self-supervision, by training a network to associate images with their corresponding sounds. These results suggest that ambient sound conveys a surprising amount of information about scene structure, and that it is a useful signal for learning multimodal features.' volume: 164 URL: https://proceedings.mlr.press/v164/chen22b.html PDF: https://proceedings.mlr.press/v164/chen22b/chen22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-chen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ziyang family: Chen - given: Xixi family: Hu - given: Andrew family: Owens editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 760-772 id: chen22b issued: date-parts: - 2022 - 1 - 11 firstpage: 760 lastpage: 772 published: 2022-01-11 00:00:00 +0000 - title: 'Fast and Efficient Locomotion via Learned Gait Transitions' abstract: 'We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strategies (ES) to train a high-level gait policy that specifies gait patterns of each foot, while the low-level convex MPC controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers.' volume: 164 URL: https://proceedings.mlr.press/v164/yang22d.html PDF: https://proceedings.mlr.press/v164/yang22d/yang22d.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yuxiang family: Yang - given: Tingnan family: Zhang - given: Erwin family: Coumans - given: Jie family: Tan - given: Byron family: Boots editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 773-783 id: yang22d issued: date-parts: - 2022 - 1 - 11 firstpage: 773 lastpage: 783 published: 2022-01-11 00:00:00 +0000 - title: 'Model-free Safe Control for Zero-Violation Reinforcement Learning' abstract: 'While deep reinforcement learning (DRL) has impressive performance in a variety of continuous control tasks, one critical hurdle that limits the application of DRL to physical world is the lack of safety guarantees. It is challenging for DRL agents to persistently satisfy a hard state constraint (known as the safety specification) during training. On the other hand, safe control methods with safety guarantees have been extensively studied. However, to synthesize safe control, these methods require explicit analytical models of the dynamic system; but these models are usually not available in DRL. This paper presents a model-free safe control strategy to synthesize safeguards for DRL agents, which will ensure zero safety violation during training. In particular, we present an implicit safe set algorithm, which synthesizes the safety index (also called the barrier certificate) and the subsequent safe control law only by querying a black-box dynamic function (e.g., a digital twin simulator). The theoretical results indicate the implicit safe set algorithm guarantees forward invariance and finite-time convergence to the safe set. We validate the proposed method on the state-of-the-art safety benchmark Safety Gym. Results show that the proposed method achieves zero safety violation and gains $ 95% \pm 9%$ cumulative reward compared to state-of-the-art safe DRL methods. Moreover, it can easily scale to high-dimensional systems.' volume: 164 URL: https://proceedings.mlr.press/v164/zhao22a.html PDF: https://proceedings.mlr.press/v164/zhao22a/zhao22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zhao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Weiye family: Zhao - given: Tairan family: He - given: Changliu family: Liu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 784-793 id: zhao22a issued: date-parts: - 2022 - 1 - 11 firstpage: 784 lastpage: 793 published: 2022-01-11 00:00:00 +0000 - title: 'Geometry-aware Bayesian Optimization in Robotics using Riemannian Matérn Kernels' abstract: 'Bayesian optimization is a data-efficient technique which can be used for control parameter tuning, parametric policy adaptation, and structure design in robotics. Many of these problems require optimization of functions defined on non-Euclidean domains like spheres, rotation groups, or spaces of positive-definite matrices. To do so, one must place a Gaussian process prior, or equivalently define a kernel, on the space of interest. Effective kernels typically reflect the geometry of the spaces they are defined on, but designing them is generally non-trivial. Recent work on the Riemannian Matérn kernels, based on stochastic partial differential equations and spectral theory of the Laplace–Beltrami operator, offers promising avenues towards constructing such geometry-aware kernels. In this paper, we study techniques for implementing these kernels on manifolds of interest in robotics, demonstrate their performance on a set of artificial benchmark functions, and illustrate geometry-aware Bayesian optimization for a variety of robotic applications, covering orientation control, manipulability optimization, and motion planning, while showing its improved performance.' volume: 164 URL: https://proceedings.mlr.press/v164/jaquier22a.html PDF: https://proceedings.mlr.press/v164/jaquier22a/jaquier22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-jaquier22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Noémie family: Jaquier - given: Viacheslav family: Borovitskiy - given: Andrei family: Smolensky - given: Alexander family: Terenin - given: Tamim family: Asfour - given: Leonel family: Rozo editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 794-805 id: jaquier22a issued: date-parts: - 2022 - 1 - 11 firstpage: 794 lastpage: 805 published: 2022-01-11 00:00:00 +0000 - title: 'Predicting Stable Configurations for Semantic Placement of Novel Objects' abstract: 'Human environments contain numerous objects configured in a variety of arrangements. Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We break this problem down into two parts: (1) finding physically valid locations for the objects and (2) determining if those poses satisfy learned, high-level semantic relationships. We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects. We train our models purely in simulation, with no fine-tuning needed for use in the real world. Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing. Our experiments through a set of simulated ablations demonstrate that using a relational classifier alone is not sufficient for reliable planning. We further demonstrate the ability of our planner to generate and execute diverse manipulation plans through a set of real-world experiments with a variety of objects.' volume: 164 URL: https://proceedings.mlr.press/v164/paxton22a.html PDF: https://proceedings.mlr.press/v164/paxton22a/paxton22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-paxton22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chris family: Paxton - given: Chris family: Xie - given: Tucker family: Hermans - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 806-815 id: paxton22a issued: date-parts: - 2022 - 1 - 11 firstpage: 806 lastpage: 815 published: 2022-01-11 00:00:00 +0000 - title: 'Just Label What You Need: Fine-Grained Active Selection for P&P through Partially Labeled Scenes' abstract: 'Self-driving vehicles must perceive and predict the future positions of nearby actors to avoid collisions and drive safely. A deep learning module is often responsible for this task, requiring large-scale, high-quality training datasets. Due to high labeling costs, active learning approaches are an appealing solution to maximizing model performance for a given labeling budget. However, despite its appeal, there has been little scientific analysis of active learning approaches for the perception and prediction (P&P) problem. In this work, we study active learning techniques for P&P and find that the traditional active learning formulation is ill-suited. We thus introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes. Extensive experiments on a real-world dataset suggest significant improvements across perception, prediction, and downstream planning tasks. ' volume: 164 URL: https://proceedings.mlr.press/v164/segal22a.html PDF: https://proceedings.mlr.press/v164/segal22a/segal22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-segal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sean family: Segal - given: Nishanth family: Kumar - given: Sergio family: Casas - given: Wenyuan family: Zeng - given: Mengye family: Ren - given: Jingkang family: Wang - given: Raquel family: Urtasun editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 816-826 id: segal22a issued: date-parts: - 2022 - 1 - 11 firstpage: 816 lastpage: 826 published: 2022-01-11 00:00:00 +0000 - title: 'Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects' abstract: 'The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transparent objects in cluttered and complex scenes, even with partially filled fluid contents within the vessels. To address the shortcomings of existing transparent object data collection schemes in literature, we also propose an automated dataset creation workflow that consists of robot-controlled image collection and vision-based automatic annotation. Through this automated workflow, we created Transparent Object Depth Dataset (TODD), which consists of nearly 15000 RGB-D images. Our experimental evaluation demonstrates that TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets, including ClearGrasp, and that it also handles cluttered scenes when trained on TODD. Code and dataset will be released at https://www.pair.toronto.edu/TranspareNet/' volume: 164 URL: https://proceedings.mlr.press/v164/xu22b.html PDF: https://proceedings.mlr.press/v164/xu22b/xu22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-xu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Haoping family: Xu - given: Yi Ru family: Wang - given: Sagi family: Eppel - given: Alan family: Aspuru-Guzik - given: Florian family: Shkurti - given: Animesh family: Garg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 827-838 id: xu22b issued: date-parts: - 2022 - 1 - 11 firstpage: 827 lastpage: 838 published: 2022-01-11 00:00:00 +0000 - title: 'Motivating Physical Activity via Competitive Human-Robot Interaction' abstract: 'This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor through iterative multi-agent reinforcement learning and show that it can perform well against human competitors. Our user study additionally found that our system was able to continuously create challenging and enjoyable interactions that significantly increased human subjects’ heart rates. The majority of human subjects considered the system to be entertaining and desirable for improving the quality of their exercise.' volume: 164 URL: https://proceedings.mlr.press/v164/yang22e.html PDF: https://proceedings.mlr.press/v164/yang22e/yang22e.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Boling family: Yang - given: Golnaz family: Habibi - given: Patrick family: Lancaster - given: Byron family: Boots - given: Joshua family: Smith editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 839-849 id: yang22e issued: date-parts: - 2022 - 1 - 11 firstpage: 839 lastpage: 849 published: 2022-01-11 00:00:00 +0000 - title: 'RoCUS: Robot Controller Understanding via Sampling' abstract: 'As robots are deployed in complex situations, engineers and end users must develop a holistic understanding of their behaviors, capabilities, and limitations. Some behaviors are directly optimized by the objective function. They often include success rate, completion time or energy consumption. Other behaviors – e.g., collision avoidance, trajectory smoothness or motion legibility – are typically emergent but equally important for safe and trustworthy deployment. Designing an objective which optimizes every aspect of robot behavior is hard. In this paper, we advocate for systematic analysis of a wide array of behaviors for holistic understanding of robot controllers and, to this end, propose a framework, RoCUS, which uses Bayesian posterior sampling to find situations where the robot controller exhibits user-specified behaviors, such as highly jerky motions. We use RoCUS to analyze three controller classes (deep learning models, rapidly exploring random trees and dynamical system formulations) on two domains (2D navigation and a 7 degree-of-freedom arm reaching), and uncover insights to further our understanding of these controllers and ultimately improve their designs. ' volume: 164 URL: https://proceedings.mlr.press/v164/zhou22a.html PDF: https://proceedings.mlr.press/v164/zhou22a/zhou22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zhou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yilun family: Zhou - given: Serena family: Booth - given: Nadia family: Figueroa - given: Julie family: Shah editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 850-860 id: zhou22a issued: date-parts: - 2022 - 1 - 11 firstpage: 850 lastpage: 860 published: 2022-01-11 00:00:00 +0000 - title: 'IMU-Assisted Learning of Single-View Rolling Shutter Correction' abstract: 'Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach.' volume: 164 URL: https://proceedings.mlr.press/v164/mo22a.html PDF: https://proceedings.mlr.press/v164/mo22a/mo22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jiawei family: Mo - given: Md Jahidul family: Islam - given: Junaed family: Sattar editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 861-870 id: mo22a issued: date-parts: - 2022 - 1 - 11 firstpage: 861 lastpage: 870 published: 2022-01-11 00:00:00 +0000 - title: 'Group-based Motion Prediction for Navigation in Crowded Environments' abstract: 'We focus on the problem of planning the motion of a robot in a dynamic multiagent environment such as a pedestrian scene. Enabling the robot to navigate safely and in a socially compliant fashion in such scenes requires a representation that accounts for the unfolding multiagent dynamics. Existing approaches to this problem tend to employ microscopic models of motion prediction that reason about the individual behavior of other agents. While such models may achieve high tracking accuracy in trajectory prediction benchmarks, they often lack an understanding of the group structures unfolding in crowded scenes. Inspired by the Gestalt theory from psychology, we build a Model Predictive Control framework (G-MPC) that leverages group-based prediction for robot motion planning. We conduct an extensive simulation study involving a series of challenging navigation tasks in scenes extracted from two real-world pedestrian datasets. We illustrate that G-MPC enables a robot to achieve statistically significantly higher safety and lower number of group intrusions than a series of baselines featuring individual pedestrian motion prediction models. Finally, we show that G-MPC can handle noisy lidar-scan estimates without significant performance losses.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22e.html PDF: https://proceedings.mlr.press/v164/wang22e/wang22e.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Allan family: Wang - given: Christoforos family: Mavrogiannis - given: Aaron family: Steinfeld editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 871-882 id: wang22e issued: date-parts: - 2022 - 1 - 11 firstpage: 871 lastpage: 882 published: 2022-01-11 00:00:00 +0000 - title: 'A Constrained Multi-Objective Reinforcement Learning Framework' abstract: 'Many real-world problems, especially in robotics, require that reinforcement learning (RL) agents learn policies that not only maximize an environment reward, but also satisfy constraints. We propose a high-level framework for solving such problems, that treats the environment reward and costs as separate objectives, and learns what preference over objectives the policy should optimize for in order to meet the constraints. We call this Learning Preferences and Policies in Parallel (LP3). By making different choices for how to learn the preference and how to optimize for the policy given the preference, we can obtain existing approaches (e.g., Lagrangian relaxation) and derive novel approaches that lead to better performance. One of these is an algorithm that learns a set of constraint-satisfying policies, useful for when we do not know the exact constraint a priori.' volume: 164 URL: https://proceedings.mlr.press/v164/huang22a.html PDF: https://proceedings.mlr.press/v164/huang22a/huang22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-huang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sandy family: Huang - given: Abbas family: Abdolmaleki - given: Giulia family: Vezzani - given: Philemon family: Brakel - given: Daniel J. family: Mankowitz - given: Michael family: Neunert - given: Steven family: Bohez - given: Yuval family: Tassa - given: Nicolas family: Heess - given: Martin family: Riedmiller - given: Raia family: Hadsell editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 883-893 id: huang22a issued: date-parts: - 2022 - 1 - 11 firstpage: 883 lastpage: 893 published: 2022-01-11 00:00:00 +0000 - title: 'CLIPort: What and Where Pathways for Robotic Manipulation' abstract: 'How can we imbue robots with the ability to manipulate objects precisely but also to reason about them in terms of abstract concepts? Recent works in manipulation have shown that end-to-end networks can learn dexterous skills that require precise spatial reasoning, but these methods often fail to generalize to new goals or quickly learn transferable concepts across tasks. In parallel, there has been great progress in learning generalizable semantic representations for vision and language by training on large-scale internet data, however these representations lack the spatial understanding necessary for fine-grained manipulation. To this end, we propose a framework that combines the best of both worlds: a two-stream architecture with semantic and spatial pathways for vision-based manipulation. Specifically, we present CLIPort, a language-conditioned imitation-learning agent that combines the broad semantic understanding (what) of CLIP [1] with the spatial precision (where) of Transporter [2]. Our end-to-end framework is capable of solving a variety of language-specified tabletop tasks from packing unseen objects to folding cloths, all without any explicit representations of object poses, instance segmentations, memory, symbolic states, or syntactic structures. Experiments in simulated and real-world settings show that our approach is data efficient in few-shot settings and generalizes effectively to seen and unseen semantic concepts. We even learn one multi-task policy for 10 simulated and 9 real-world tasks that is better or comparable to single-task policies.' volume: 164 URL: https://proceedings.mlr.press/v164/shridhar22a.html PDF: https://proceedings.mlr.press/v164/shridhar22a/shridhar22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-shridhar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Mohit family: Shridhar - given: Lucas family: Manuelli - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 894-906 id: shridhar22a issued: date-parts: - 2022 - 1 - 11 firstpage: 894 lastpage: 906 published: 2022-01-11 00:00:00 +0000 - title: 'S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning in Robotics' abstract: 'Offline reinforcement learning proposes to learn policies from large collected datasets without interacting with the physical environment. These algorithms have made it possible to learn useful skills from data that can then be deployed in the environment in real-world settings where interactions may be costly or dangerous, such as autonomous driving or factories. However, offline agents are unable to access the environment to collect new data, and therefore are trained on a static dataset. In this paper, we study the effectiveness of performing data augmentations on the state space, and study 7 different augmentation schemes and how they behave with existing offline RL algorithms. We then combine the best data performing augmentation scheme with a state-of-the-art Q-learning technique, and improve the function approximation of the Q-networks by smoothening out the learned state-action space. We experimentally show that using this Surprisingly Simple Self-Supervision technique in RL (S4RL), we significantly improve over the current state-of-the-art algorithms on offline robot learning environments such as MetaWorld [1] and RoboSuite [2,3], and benchmark datasets such as D4RL [4].' volume: 164 URL: https://proceedings.mlr.press/v164/sinha22a.html PDF: https://proceedings.mlr.press/v164/sinha22a/sinha22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-sinha22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Samarth family: Sinha - given: Ajay family: Mandlekar - given: Animesh family: Garg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 907-917 id: sinha22a issued: date-parts: - 2022 - 1 - 11 firstpage: 907 lastpage: 917 published: 2022-01-11 00:00:00 +0000 - title: 'Aligning an optical interferometer with beam divergence control and continuous action space' abstract: 'Reinforcement learning is finding its way to real-world problem application, transferring from simulated environments to physical setups. In this work, we implement vision-based alignment of an optical Mach-Zehnder interferometer with a confocal telescope in one arm, which controls the diameter and divergence of the corresponding beam. We use a continuous action space; exponential scaling enables us to handle actions within a range of over two orders of magnitude. Our agent trains only in a simulated environment with domain randomizations. In an experimental evaluation, the agent significantly outperforms an existing solution and a human expert.' volume: 164 URL: https://proceedings.mlr.press/v164/makarenko22a.html PDF: https://proceedings.mlr.press/v164/makarenko22a/makarenko22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-makarenko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Stepan family: Makarenko - given: Dmitry Igorevich family: Sorokin - given: Alexander family: Ulanov - given: Alexander family: Lvovsky editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 918-927 id: makarenko22a issued: date-parts: - 2022 - 1 - 11 firstpage: 918 lastpage: 927 published: 2022-01-11 00:00:00 +0000 - title: 'Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots' abstract: 'Legged locomotion is commonly studied and expressed as a discrete set of gait patterns, like walk, trot, gallop, which are usually treated as given and pre-programmed in legged robots for efficient locomotion at different speeds. However, fixing a set of pre-programmed gaits limits the generality of locomotion. Recent animal motor studies show that these conventional gaits are only prevalent in ideal flat terrain conditions while real-world locomotion is unstructured and more like bouts of intermittent steps. What principles could lead to both structured and unstructured patterns across mammals and how to synthesize them in robots? In this work, we take an analysis-by-synthesis approach and learn to move by minimizing mechanical energy. We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots. The emergent gaits are structured in ideal terrains and look similar to that of horses and sheep. The same approach leads to unstructured gaits in rough terrains which is consistent with the findings in animal motor control. We validate our hypothesis in both simulation and real hardware across natural terrains. Videos at https://energy-locomotion.github.io' volume: 164 URL: https://proceedings.mlr.press/v164/fu22a.html PDF: https://proceedings.mlr.press/v164/fu22a/fu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-fu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Zipeng family: Fu - given: Ashish family: Kumar - given: Jitendra family: Malik - given: Deepak family: Pathak editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 928-937 id: fu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 928 lastpage: 937 published: 2022-01-11 00:00:00 +0000 - title: 'SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo' abstract: 'Robot manipulation of unknown objects in unstructured environments is a challenging problem due to the variety of shapes, materials, arrangements and lighting conditions. Even with large-scale real-world data collection, robust perception and manipulation of transparent and reflective objects across various lighting conditions remains challenging. To address these challenges we propose an approach to performing sim-to-real transfer of robotic perception. The underlying model, SimNet, is trained as a single multi-headed neural network using simulated stereo data as input and simulated object segmentation masks, 3D oriented bounding boxes (OBBs), object keypoints and disparity as output. A key component of SimNet is the incorporation of a learned stereo sub-network that predicts disparity. SimNet is evaluated on unknown object detection and deformable object keypoint detection and significantly outperforms a baseline that uses a structured light RGB-D sensor. By inferring grasp positions using the OBB and keypoint predictions, SimNet can be used to perform end-to-end manipulation of unknown objects across our fleet of Toyota HSR robots. In object grasping experiments, SimNet significantly outperforms the RBG-D baseline on optically challenging objects, suggesting that SimNet can enable robust manipulation of unknown objects, including transparent objects, in novel environments. Additional visualizations and materials are located at https://tinyurl.com/simnet-corl.' volume: 164 URL: https://proceedings.mlr.press/v164/kollar22a.html PDF: https://proceedings.mlr.press/v164/kollar22a/kollar22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kollar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Thomas family: Kollar - given: Michael family: Laskey - given: Kevin family: Stone - given: Brijen family: Thananjeyan - given: Mark family: Tjersland editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 938-948 id: kollar22a issued: date-parts: - 2022 - 1 - 11 firstpage: 938 lastpage: 948 published: 2022-01-11 00:00:00 +0000 - title: 'Social Interactions as Recursive MDPs' abstract: 'While machines and robots must interact with humans, providing them with social skills has been a largely overlooked topic. This is mostly a consequence of the fact that tasks such as navigation, command following, and even game playing are well-defined, while social reasoning still mostly remains a pre-theoretic problem. We demonstrate how social interactions can be effectively incorporated into MDPs by reasoning recursively about the goals of other agents. In essence, our method extends the reward function to include a combination of physical goals (something agents want to accomplish in the configuration space, a traditional MDP) and social goals (something agents want to accomplish relative to the goals of other agents). Our Social MDPs allow specifying reward functions in terms of the estimated reward functions of other agents, modeling interactions such as helping or hindering another agent (by maximizing or minimizing the other agent’s reward) while balancing this with the actual physical goals of each agent. Our formulation allows for an arbitrary function of another agent’s estimated reward structure and physical goals, enabling more complex behaviors such as politely hindering another agent or aggressively helping them. Extending Social MDPs in the same manner as I-POMDPs extension would enable interactions such as convincing another agent that something is true. To what extent the Social MDPs presented here and their potential Social POMDPs variant account for all possible social interactions is unknown, but having a precise mathematical model to guide questions about social interactions has both practical value (we demonstrate how to make zero-shot social inferences and one could imagine chatbots and robots guided by Social MDPs) and theoretical value by bringing the tools of MDP that have so successfully organized research around navigation to hopefully shed light on what social interactions really are given their extreme importance to human well-being and human civilization.' volume: 164 URL: https://proceedings.mlr.press/v164/tejwani22a.html PDF: https://proceedings.mlr.press/v164/tejwani22a/tejwani22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-tejwani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ravi family: Tejwani - given: Yen-Ling family: Kuo - given: Tianmin family: Shu - given: Boris family: Katz - given: Andrei family: Barbu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 949-958 id: tejwani22a issued: date-parts: - 2022 - 1 - 11 firstpage: 949 lastpage: 958 published: 2022-01-11 00:00:00 +0000 - title: 'LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks' abstract: 'Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environments to learn complex tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for learning in dynamically uncertain environments is requiring that the agent can robustly return to learned Safe Sets, where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with visual observation spaces is exceedingly challenging. We present a novel continuous representation for Safe Sets framed as a binary classification problem in a learned latent space, which flexibly scales to high-dimensional image observations. We then present a new algorithm, Latent Space Safe Sets (LS3), which uses this representation for long-horizon control. We evaluate LS3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-safe-sets for supplementary material.' volume: 164 URL: https://proceedings.mlr.press/v164/wilcox22a.html PDF: https://proceedings.mlr.press/v164/wilcox22a/wilcox22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wilcox22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Albert family: Wilcox - given: Ashwin family: Balakrishna - given: Brijen family: Thananjeyan - given: Joseph E. family: Gonzalez - given: Ken family: Goldberg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 959-969 id: wilcox22a issued: date-parts: - 2022 - 1 - 11 firstpage: 959 lastpage: 969 published: 2022-01-11 00:00:00 +0000 - title: 'Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning' abstract: 'Our goal is to perform out-of-distribution (OOD) detection, i.e., to detect when a robot is operating in environments that are drawn from a different distribution than the environments used to train the robot. We leverage Probably Approximately Correct (PAC)-Bayes theory in order to train a policy with a guaranteed bound on performance on the training distribution. Our key idea for OOD detection then relies on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD. We formalize this via statistical techniques based on p-values and concentration inequalities. The resulting approach (i) provides guaranteed confidence bounds on OOD detection, and (ii) is task-driven and sensitive only to changes that impact the robot’s performance. We demonstrate our approach on a simulated example of grasping objects with unfamiliar poses or shapes. We also present both simulation and hardware experiments for a drone performing vision-based obstacle avoidance in unfamiliar environments (including wind disturbances and different obstacle densities). Our examples demonstrate that we can perform task-driven OOD detection within just a handful of trials. Comparisons with baselines also demonstrate the advantages of our approach in terms of providing statistical guarantees and being insensitive to task-irrelevant distribution shifts.' volume: 164 URL: https://proceedings.mlr.press/v164/farid22a.html PDF: https://proceedings.mlr.press/v164/farid22a/farid22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-farid22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Alec family: Farid - given: Sushant family: Veer - given: Anirudha family: Majumdar editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 970-980 id: farid22a issued: date-parts: - 2022 - 1 - 11 firstpage: 970 lastpage: 980 published: 2022-01-11 00:00:00 +0000 - title: 'Redundancy Resolution as Action Bias in Policy Search for Robotic Manipulation' abstract: 'We propose a novel approach that biases actions during policy search by lifting the concept of redundancy resolution from multi-DoF robot kinematics to the level of the reward in deep reinforcement learning and evolution strategies. The key idea is to bias the distribution of executed actions in the sense that the immediate reward remains unchanged. The resulting biased actions favor secondary objectives yielding policies that are safer to apply on the real robot. We demonstrate the feasibility of our method, considered as policy search with redundant action bias (PSRAB), in a reaching and a pick-and-lift task with a 7-DoF Franka robot arm trained in RLBench - a recently introduced benchmark for robotic manipulation - using state-of-the-art TD3 deep reinforcement learning and OpenAI’s evolutionary strategy. We show that it is a flexible approach without the need of significant fine-tuning and interference with the main objective even across different policy search methods and tasks of different complexity. We evaluate our approach in simulation and on the real robot. Our project website with videos and further results can be found at: https://sites.google.com/view/redundant-action-bias' volume: 164 URL: https://proceedings.mlr.press/v164/al-hafez22a.html PDF: https://proceedings.mlr.press/v164/al-hafez22a/al-hafez22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-al-hafez22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Firas family: Al-Hafez - given: Jochen J. family: Steil editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 981-990 id: al-hafez22a issued: date-parts: - 2022 - 1 - 11 firstpage: 981 lastpage: 990 published: 2022-01-11 00:00:00 +0000 - title: 'BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning' abstract: 'In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%, without any robot demonstrations for those tasks.' volume: 164 URL: https://proceedings.mlr.press/v164/jang22a.html PDF: https://proceedings.mlr.press/v164/jang22a/jang22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-jang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Eric family: Jang - given: Alex family: Irpan - given: Mohi family: Khansari - given: Daniel family: Kappler - given: Frederik family: Ebert - given: Corey family: Lynch - given: Sergey family: Levine - given: Chelsea family: Finn editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 991-1002 id: jang22a issued: date-parts: - 2022 - 1 - 11 firstpage: 991 lastpage: 1002 published: 2022-01-11 00:00:00 +0000 - title: 'Motion Forecasting with Unlikelihood Training in Continuous Space' abstract: 'Motion forecasting is essential for making safe and intelligent decisions in robotic applications such as autonomous driving. Existing methods often formulate it as a sequence-to-sequence prediction problem, solved in an encoder-decoder framework with a maximum likelihood estimation objective. State-of-the-art models leverage contextual information including the map and states of surrounding agents. However, we observe that they still assign a high probability to unlikely trajectories resulting in unsafe behaviors including road boundary violations. Orthogonally, we propose a new objective, unlikelihood training, which forces predicted trajectories that conflict with contextual information to be assigned a lower probability. We demonstrate that our method can improve state-of-art models’ performance on challenging real-world trajectory forecasting datasets (nuScenes and Argoverse) by avoiding up to 56% context-violated prediction and improving up to 9% prediction accuracy. Code is avaliable at https://github.com/Vision-CAIR/UnlikelihoodMotionForecasting' volume: 164 URL: https://proceedings.mlr.press/v164/zhu22a.html PDF: https://proceedings.mlr.press/v164/zhu22a/zhu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zhu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Deyao family: Zhu - given: Mohamed family: Zahran - given: Li Erran family: Li - given: Mohamed family: Elhoseiny editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1003-1012 id: zhu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1003 lastpage: 1012 published: 2022-01-11 00:00:00 +0000 - title: 'Exploring Adversarial Robustness of Multi-sensor Perception Systems in Self Driving' abstract: 'Modern self-driving perception systems have been shown to improve upon processing complementary inputs such as LiDAR with images. In isolation, 2D images have been found to be extremely vulnerable to adversarial attacks. Yet, there are limited studies on the adversarial robustness of multi-modal models that fuse LiDAR and image features. Furthermore, existing works do not consider physically realizable perturbations that are consistent across the input modalities. In this paper, we showcase practical susceptibilities of multi-sensor detection by inserting an adversarial object on a host vehicle. We focus on physically realizable and input-agnostic attacks that are feasible to execute in practice, and show that a single universal adversary can hide different host vehicles from state-of-the-art multi-modal detectors. Our experiments demonstrate that successful attacks are primarily caused by easily corrupted image features. Furthermore, in modern sensor fusion methods which project image features into 3D, adversarial attacks can exploit the projection process to generate false positives in distant regions in 3D. Towards more robust multi-modal perception systems, we show that adversarial training with feature denoising can boost robustness to such attacks significantly.' volume: 164 URL: https://proceedings.mlr.press/v164/tu22a.html PDF: https://proceedings.mlr.press/v164/tu22a/tu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-tu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: James family: Tu - given: Huichen family: Li - given: Xinchen family: Yan - given: Mengye family: Ren - given: Yun family: Chen - given: Ming family: Liang - given: Eilyan family: Bitar - given: Ersin family: Yumer - given: Raquel family: Urtasun editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1013-1024 id: tu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1013 lastpage: 1024 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to Jump from Pixels' abstract: 'Today’s robotic quadruped systems can robustly walk over a diverse range of rough but continuous terrains, where the terrain elevation varies gradually. Locomotion on discontinuous terrains, such as those with gaps or obstacles, presents a complementary set of challenges. In discontinuous settings, it becomes necessary to plan ahead using visual inputs and to execute agile behaviors beyond robust walking, such as jumps. Such dynamic motion results in significant motion of onboard sensors, which introduces a new set of challenges for real-time visual processing. The requirements of agility and terrain awareness in this setting reinforce the need for robust control. We present Depth-based Impulse Control (DIC), a method for synthesizing highly agile visually-guided locomotion behaviors. DIC affords the flexibility of model-free learning but regularizes behavior through explicit model-based optimization of ground reaction forces. We evaluate performance both in simulation and in the real world.' volume: 164 URL: https://proceedings.mlr.press/v164/margolis22a.html PDF: https://proceedings.mlr.press/v164/margolis22a/margolis22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-margolis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Gabriel B family: Margolis - given: Tao family: Chen - given: Kartik family: Paigwar - given: Xiang family: Fu - given: Donghyun family: Kim - given: Sang bae family: Kim - given: Pulkit family: Agrawal editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1025-1034 id: margolis22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1025 lastpage: 1034 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to Predict Vehicle Trajectories with Model-based Planning' abstract: 'Predicting the future trajectories of on-road vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Model-based Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibility-guaranteed future trajectory predictions. PRIME guarantees the trajectory feasibility by exploiting a model-based generator to produce future trajectories under explicit constraints and enables accurate multimodal prediction by utilizing a learning-based evaluator to select future trajectories. We conduct experiments on the large-scale Argoverse Motion Forecasting Benchmark, where PRIME outperforms the state-of-the-art methods in prediction accuracy, feasibility, and robustness under imperfect tracking.' volume: 164 URL: https://proceedings.mlr.press/v164/song22a.html PDF: https://proceedings.mlr.press/v164/song22a/song22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-song22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Haoran family: Song - given: Di family: Luan - given: Wenchao family: Ding - given: Michael Y family: Wang - given: Qifeng family: Chen editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1035-1045 id: song22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1035 lastpage: 1045 published: 2022-01-11 00:00:00 +0000 - title: 'LanguageRefer: Spatial-Language Model for 3D Visual Grounding' abstract: 'For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to identify common objects in real-world 3D scenes. In this paper, we introduce a spatial-language model for a 3D visual grounding problem. Specifically, given a reconstructed 3D scene in the form of point clouds with 3D bounding boxes of potential object candidates, and a language utterance referring to a target object in the scene, our model successfully identifies the target object from a set of potential candidates. Specifically, LanguageRefer uses a transformer-based architecture that combines spatial embedding from bounding boxes with fine-tuned language embeddings from DistilBert to predict the target object. We show that it performs competitively on visio-linguistic datasets proposed by ReferIt3D. Further, we analyze its spatial reasoning task performance decoupled from perception noise, the accuracy of view-dependent utterances, and viewpoint annotations for potential robotics applications. Project website: https://sites.google.com/view/language-refer.' volume: 164 URL: https://proceedings.mlr.press/v164/roh22a.html PDF: https://proceedings.mlr.press/v164/roh22a/roh22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-roh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Junha family: Roh - given: Karthik family: Desingh - given: Ali family: Farhadi - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1046-1056 id: roh22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1046 lastpage: 1056 published: 2022-01-11 00:00:00 +0000 - title: 'Legged Robot State Estimation using Invariant Kalman Filtering and Learned Contact Events' abstract: 'This work develops a learning-based contact estimator for legged robots that bypasses the need for physical sensors and takes multi-modal proprioceptive sensory data as input. Unlike vision-based state estimators, proprioceptive state estimators are agnostic to perceptually degraded situations such as dark or foggy scenes. While some robots are equipped with dedicated physical sensors to detect necessary contact data for state estimation, some robots do not have dedicated contact sensors, and the addition of such sensors is non-trivial without redesigning the hardware. The trained network can estimate contact events on different terrains. The experiments show that a contact-aided invariant extended Kalman filter can generate accurate odometry trajectories compared to a state-of-the-art visual SLAM system, enabling robust proprioceptive odometry.' volume: 164 URL: https://proceedings.mlr.press/v164/lin22b.html PDF: https://proceedings.mlr.press/v164/lin22b/lin22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lin22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tzu-Yuan family: Lin - given: Ray family: Zhang - given: Justin family: Yu - given: Maani family: Ghaffari editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1057-1066 id: lin22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1057 lastpage: 1066 published: 2022-01-11 00:00:00 +0000 - title: 'The Boombox: Visual Reconstruction from Acoustic Vibrations' abstract: 'Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.' volume: 164 URL: https://proceedings.mlr.press/v164/chen22c.html PDF: https://proceedings.mlr.press/v164/chen22c/chen22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-chen22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Boyuan family: Chen - given: Mia family: Chiquier - given: Hod family: Lipson - given: Carl family: Vondrick editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1067-1077 id: chen22c issued: date-parts: - 2022 - 1 - 11 firstpage: 1067 lastpage: 1077 published: 2022-01-11 00:00:00 +0000 - title: 'AW-Opt: Learning Robotic Skills with Imitation andReinforcement at Scale' abstract: 'Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge. In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that can utilize demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems. Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression and QT-Opt, providing a unified approach for integrating demonstrations and offline data for robotic manipulation. Please see https://awopt.github.io for more details.' volume: 164 URL: https://proceedings.mlr.press/v164/lu22a.html PDF: https://proceedings.mlr.press/v164/lu22a/lu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yao family: Lu - given: Karol family: Hausman - given: Yevgen family: Chebotar - given: Mengyuan family: Yan - given: Eric family: Jang - given: Alexander family: Herzog - given: Ted family: Xiao - given: Alex family: Irpan - given: Mohi family: Khansari - given: Dmitry family: Kalashnikov - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1078-1088 id: lu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1078 lastpage: 1088 published: 2022-01-11 00:00:00 +0000 - title: 'Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes' abstract: 'We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple “pick-and-place” solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can efficiently handle multiple object combinations in the real world and exhibit a large variety of stacking skills. In a large experimental study, we investigate what choices matter for learning such general vision-based agents in simulation, and what affects optimal transfer to the real robot. We then leverage data collected by such policies and improve upon them with offline RL. A video and a blog post of our work are provided as supplementary material.' volume: 164 URL: https://proceedings.mlr.press/v164/lee22b.html PDF: https://proceedings.mlr.press/v164/lee22b/lee22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lee22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Alex X. family: Lee - given: Coline Manon family: Devin - given: Yuxiang family: Zhou - given: Thomas family: Lampe - given: Konstantinos family: Bousmalis - given: Jost Tobias family: Springenberg - given: Arunkumar family: Byravan - given: Abbas family: Abdolmaleki - given: Nimrod family: Gileadi - given: David family: Khosid - given: Claudio family: Fantacci - given: Jose Enrique family: Chen - given: Akhil family: Raju - given: Rae family: Jeong - given: Michael family: Neunert - given: Antoine family: Laurens - given: Stefano family: Saliceti - given: Federico family: Casarini - given: Martin family: Riedmiller - given: raia family: hadsell - given: Francesco family: Nori editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1089-1131 id: lee22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1089 lastpage: 1131 published: 2022-01-11 00:00:00 +0000 - title: 'Influencing Towards Stable Multi-Agent Interactions' abstract: 'Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent’s or partner’s changing behaviors. Instead of reactively adapting to the other agent’s (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent’s strategy to stabilize – which can restrain the non-stationarity caused by the other agent. We learn a low-dimensional latent representation of the other agent’s strategy and the dynamics of how the latent strategy evolves with respect to our robot’s behavior. With this learned dynamics model, we can define an unsupervised stability reward to train our robot to deliberately influence the other agent to stabilize towards a single strategy. We demonstrate the effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation. We show qualitative results on our website.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22f.html PDF: https://proceedings.mlr.press/v164/wang22f/wang22f.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Woodrow Zhouyuan family: Wang - given: Andy family: Shih - given: Annie family: Xie - given: Dorsa family: Sadigh editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1132-1143 id: wang22f issued: date-parts: - 2022 - 1 - 11 firstpage: 1132 lastpage: 1143 published: 2022-01-11 00:00:00 +0000 - title: 'Strength Through Diversity: Robust Behavior Learning via Mixture Policies' abstract: 'Efficiency in robot learning is highly dependent on hyperparameters. Robot morphology and task structure differ widely and finding the optimal setting typically requires sequential or parallel repetition of experiments, strongly increasing the interaction count. We propose a training method that only relies on a single trial by enabling agents to select and combine controller designs conditioned on the task. Our Hyperparameter Mixture Policies (HMPs) feature diverse sub-policies that vary in distribution types and parameterization, reducing the impact of design choices and unlocking synergies between low-level components. We demonstrate strong performance on continuous control tasks, including a simulated ANYmal robot, showing that HMPs yield robust, data-efficient learning.' volume: 164 URL: https://proceedings.mlr.press/v164/seyde22a.html PDF: https://proceedings.mlr.press/v164/seyde22a/seyde22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-seyde22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tim family: Seyde - given: Wilko family: Schwarting - given: Igor family: Gilitschenski - given: Markus family: Wulfmeier - given: Daniela family: Rus editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1144-1155 id: seyde22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1144 lastpage: 1155 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles' abstract: 'Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this objective. This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling. The policy is then trained on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve long-term performance. We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives. In sparse and hard to explore environments we achieve an average improvement of over 30%.' volume: 164 URL: https://proceedings.mlr.press/v164/seyde22b.html PDF: https://proceedings.mlr.press/v164/seyde22b/seyde22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-seyde22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tim family: Seyde - given: Wilko family: Schwarting - given: Sertac family: Karaman - given: Daniela family: Rus editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1156-1167 id: seyde22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1156 lastpage: 1167 published: 2022-01-11 00:00:00 +0000 - title: 'Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes' abstract: 'This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learning algorithm that brings the derived theory into practice. In experiments from two different robotic tasks – inverse dynamics of a manipulator and object detection on a micro-aerial vehicle (MAV) – we show the effectiveness of our approach in terms of predictive uncertainty, improved scalability, and run-time efficiency on a Jetson TX2. We thus argue that our approach can pave the way towards reliable and fast robot learning systems with uncertainty awareness.' volume: 164 URL: https://proceedings.mlr.press/v164/lee22c.html PDF: https://proceedings.mlr.press/v164/lee22c/lee22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lee22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jongseok family: Lee - given: Jianxiang family: Feng - given: Matthias family: Humt - given: Marcus Gerhard family: Müller - given: Rudolph family: Triebel editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1168-1179 id: lee22c issued: date-parts: - 2022 - 1 - 11 firstpage: 1168 lastpage: 1179 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Multi-Stage Tasks with One Demonstration via Self-Replay' abstract: 'In this work, we introduce a novel method to learn everyday-like multi-stage tasks from a single human demonstration, without requiring any prior object knowledge. Inspired by the recent Coarse-to-Fine Imitation Learning, we model imitation learning as a learned object reaching phase followed by an open-loop replay of the operator’s actions. We build upon this for multi-stage tasks where, following the human demonstration, the robot can autonomously collect image data for the entire multi-stage task, by reaching the next object in the sequence and then replaying the demonstration, repeating in a loop for all stages of the task. We evaluate with real-world experiments on a set of everyday multi-stage tasks, which we show that our method can solve from a single demonstration. Videos and supplementary material can be found at this webpage.' volume: 164 URL: https://proceedings.mlr.press/v164/palo22a.html PDF: https://proceedings.mlr.press/v164/palo22a/palo22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-palo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Norman Di family: Palo - given: Edward family: Johns editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1180-1189 id: palo22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1180 lastpage: 1189 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Behaviors through Physics-driven Latent Imagination' abstract: 'Model-based reinforcement learning (MBRL) consists in learning a so-called world model, a representation of the environment through interactions with it, then use it to train an agent. This approach is particularly interesting in the con-text of field robotics, as it alleviates the need to train online, and reduces the risks inherent to directly training agents on real robots. Generally, in such approaches, the world encompasses both the part related to the robot itself and the rest of the environment. We argue that decoupling the environment representation (for example, images or laser scans) from the dynamics of the physical system (that is, the robot and its physical state) can increase the flexibility of world models and open doors to greater robustness. In this paper, we apply this concept to a strong latent-agent, Dreamer. We then showcase the increased flexibility by transferring the environment part of the world model from one robot (a boat) to another (a rover), simply by adapting the physical model in the imagination. We additionally demonstrate the robustness of our method through real-world experiments on a boat.' volume: 164 URL: https://proceedings.mlr.press/v164/richard22a.html PDF: https://proceedings.mlr.press/v164/richard22a/richard22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-richard22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Antoine family: Richard - given: Stéphanie family: Aravecchia - given: Matthieu family: Geist - given: Cédric family: Pradalier editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1190-1199 id: richard22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1190 lastpage: 1199 published: 2022-01-11 00:00:00 +0000 - title: 'Enhancing Consistent Ground Maneuverability by Robot Adaptation to Complex Off-Road Terrains' abstract: 'Terrain adaptation is a critical ability for a ground robot to effectively traverse unstructured off-road terrain in real-world field environments such as forests. However, the expected or planned maneuvering behaviors cannot always be accurately executed due to setbacks such as reduced tire pressure. This inconsistency negatively affects the robot’s ground maneuverability and can cause slower traversal time or errors in localization. To address this shortcoming, we propose a novel method for consistent behavior generation that enables a ground robot’s actual behaviors to more accurately match expected behaviors while adapting to a variety of complex off-road terrains. Our method learns offset behaviors in a self-supervised fashion to compensate for the inconsistency between the actual and expected behaviors without requiring the explicit modeling of various setbacks. To evaluate the method, we perform extensive experiments using a physical ground robot over diverse complex off-road terrain in real-world field environments. Experimental results show that our method enables a robot to improve its ground maneuverability on complex unstructured off-road terrain with more navigational behavior consistency, and outperforms previous and baseline methods, particularly so on challenging terrain such as that which is seen in forests.' volume: 164 URL: https://proceedings.mlr.press/v164/siva22a.html PDF: https://proceedings.mlr.press/v164/siva22a/siva22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-siva22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sriram family: Siva - given: Maggie family: Wigness - given: John family: Rogers - given: Hao family: Zhang editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1200-1210 id: siva22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1200 lastpage: 1210 published: 2022-01-11 00:00:00 +0000 - title: 'Self-Improving Semantic Perception for Indoor Localisation' abstract: 'We propose a novel robotic system that can improve its perception during deployment. Contrary to the established approach of learning semantics from large datasets and deploying fixed models, we propose a framework in which semantic models are continuously updated on the robot to adapt to the deployment environments. By combining continual learning with self-supervision, our robotic system learns online during deployment without external supervision. We conduct real-world experiments with robots localising in 3D floorplans. Our experiments show how the robot’s semantic perception improves during deployment and how this translates into improved localisation, even across drastically different environments. We further study the risk of catastrophic forgetting that such a continuous learning setting poses. We find memory replay an effective measure to reduce forgetting and show how the robotic system can improve even when switching between different environments. On average, our system improves by 60% in segmentation and 10% in localisation accuracy compared to deployment of a fixed model, and it maintains this improvement while adapting to further environments.' volume: 164 URL: https://proceedings.mlr.press/v164/blum22a.html PDF: https://proceedings.mlr.press/v164/blum22a/blum22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-blum22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Hermann family: Blum - given: Francesco family: Milano - given: René family: Zurbrügg - given: Roland family: Siegwart - given: Cesar family: Cadena - given: Abel family: Gawel editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1211-1222 id: blum22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1211 lastpage: 1222 published: 2022-01-11 00:00:00 +0000 - title: 'Anomaly Detection in Multi-Agent Trajectories for Automated Driving' abstract: 'Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we first estimate a density function of the learned trajectory feature representation and then detect anomalies in low-density regions. Due to the lack of multi-agent trajectory datasets for anomaly detection in automated driving, we introduce our dataset using a driving simulator for normal and abnormal manoeuvres. Our evaluations show that our approach learns the relation between different agents and delivers promising results compared to the related works. The code, simulation and the dataset are publicly available.' volume: 164 URL: https://proceedings.mlr.press/v164/wiederer22a.html PDF: https://proceedings.mlr.press/v164/wiederer22a/wiederer22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wiederer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Julian family: Wiederer - given: Arij family: Bouazizi - given: Marco family: Troina - given: Ulrich family: Kressel - given: Vasileios family: Belagiannis editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1223-1233 id: wiederer22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1223 lastpage: 1233 published: 2022-01-11 00:00:00 +0000 - title: 'Assisted Robust Reward Design' abstract: 'Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however, reward design is an iterative process: the designer chooses a reward, eventually encounters an “edge-case” environment where the reward incentivizes the wrong behavior, revises the reward, and repeats. What would it mean to rethink robotics problems to formally account for this iterative nature of reward design? We propose that the robot not take the specified reward for granted, but rather have uncertainty about it, and account for the future design iterations as future evidence. We contribute an Assisted Reward Design method that speeds up the design process by anticipating and influencing this future evidence: rather than letting the designer eventually encounter failure cases and revise the reward then, the method actively exposes the designer to such environments during the development phase. We test this method in an autonomous driving task and find that it more quickly improves the car’s behavior in held-out environments by proposing environments that are “edge cases” for the current reward.' volume: 164 URL: https://proceedings.mlr.press/v164/he22a.html PDF: https://proceedings.mlr.press/v164/he22a/he22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-he22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jerry Zhi-Yang family: He - given: Anca D. family: Dragan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1234-1246 id: he22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1234 lastpage: 1246 published: 2022-01-11 00:00:00 +0000 - title: 'Generating Scenarios with Diverse Pedestrian Behaviors for Autonomous Vehicle Testing' abstract: 'There exist several datasets for developing self-driving car methodologies. Manually collected datasets impose inherent limitations on the variability of test cases and it is particularly difficult to acquire challenging scenarios, e.g. ones involving collisions with pedestrians. A way to alleviate this is to consider automatic generation of safety-critical scenarios for autonomous vehicle (AV) testing. Existing approaches for scenario generation use heuristic pedestrian behavior models. We instead propose a framework that can use state-of-the-art pedestrian motion models, which is achieved by reformulating the problem as learning where to place pedestrians such that the induced scenarios are collision prone for a given AV. Our pedestrian initial location model can be used in conjunction with any goal driven pedestrian model which makes it possible to challenge an AV with a wide range of pedestrian behaviors – this ensures that the AV can avoid collisions with any pedestrian it encounters. We show that it is possible to learn a collision seeking scenario generation model when both the pedestrian and AV are collision avoiding. The initial location model is conditioned on scene semantics and occlusions to ensure semantic and visual plausibility, which increases the realism of generated scenarios. Our model can be used to test any AV model given sufficient constraints.' volume: 164 URL: https://proceedings.mlr.press/v164/priisalu22a.html PDF: https://proceedings.mlr.press/v164/priisalu22a/priisalu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-priisalu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Maria family: Priisalu - given: Aleksis family: Pirinen - given: Ciprian family: Paduraru - given: Cristian family: Sminchisescu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1247-1258 id: priisalu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1247 lastpage: 1258 published: 2022-01-11 00:00:00 +0000 - title: 'Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback' abstract: 'A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve downstream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22g.html PDF: https://proceedings.mlr.press/v164/wang22g/wang22g.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xiaofei family: Wang - given: Kimin family: Lee - given: Kourosh family: Hakhamaneshi - given: Pieter family: Abbeel - given: Michael family: Laskin editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1259-1268 id: wang22g issued: date-parts: - 2022 - 1 - 11 firstpage: 1259 lastpage: 1268 published: 2022-01-11 00:00:00 +0000 - title: 'Visual Learning Towards Soft Robot Force Control using a 3D Metamaterial with Differential Stiffness' abstract: 'This paper explores the feasibility of learning robot force control and interaction using soft metamaterial and machine vision. We start by investigating the differential stiffness of a hollow, cone-shaped, 3D metamaterial made from soft rubber, achieving a large stiffness ratio between the axial and radial directions that leads to an adaptive form response in omni-directions during physical interaction. Then, using image data collected from its internal deformation during various interactions, we explored two similar designs but different learning strategies to estimate force control and interactions on the end-effector of a UR10 e-series robot arm. One is to directly learn the force and torque response from raw images of the metamaterial’s internal deformation. The other is to indirectly estimate the 6D force and torque using a neural network by visually tracking the 6D pose of a marker fixed inside the 3D metamaterial. Finally, we integrated the two proposed systems and achieved similar force feedback and control interactions in simple tasks such as circle following and text writing. Our results show that the learning method holds the potential to support the concept of soft robot force control, providing an intuitive interface at a low cost for robotic systems, generating comparable and capable performances against classical force and torque sensors.' volume: 164 URL: https://proceedings.mlr.press/v164/wan22a.html PDF: https://proceedings.mlr.press/v164/wan22a/wan22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Fang family: Wan - given: Xiaobo family: Liu - given: Ning family: Guo - given: Xudong family: Han - given: Feng family: Tian - given: Chaoyang family: Song editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1269-1278 id: wan22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1269 lastpage: 1278 published: 2022-01-11 00:00:00 +0000 - title: 'Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration' abstract: 'We present a method for learning human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop. Supplementary materials and videos at https://sites.google.com/view/cogail/home' volume: 164 URL: https://proceedings.mlr.press/v164/wang22h.html PDF: https://proceedings.mlr.press/v164/wang22h/wang22h.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chen family: Wang - given: Claudia family: Pérez-D’Arpino - given: Danfei family: Xu - given: Li family: Fei-Fei - given: Karen family: Liu - given: Silvio family: Savarese editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1279-1290 id: wang22h issued: date-parts: - 2022 - 1 - 11 firstpage: 1279 lastpage: 1290 published: 2022-01-11 00:00:00 +0000 - title: 'Visual-Locomotion: Learning to Walk on Complex Terrains with Vision' abstract: 'Vision is one of the essential perception modalities for legged robots to safely and efficiently navigate uneven terrains, such as stairs and stepping stones. However, training robots to effectively understand high-dimensional visual input for locomotion is a challenging problem. In this work, we propose a framework to train a vision-based locomotion controller which enables a quadrupedal robot to traverse uneven environments. The key idea is to introduce a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as inputs the perceived vision signals as well as robot states and outputs the desired footholds and base movement of the robot. These are then realized by the low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as randomly placed stepping stones, quincuncial piles, stairs, and moving platforms. We also validate our method on a real robot to walk over a series of gaps and climbing up a platform.' volume: 164 URL: https://proceedings.mlr.press/v164/yu22a.html PDF: https://proceedings.mlr.press/v164/yu22a/yu22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-yu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Wenhao family: Yu - given: Deepali family: Jain - given: Alejandro family: Escontrela - given: Atil family: Iscen - given: Peng family: Xu - given: Erwin family: Coumans - given: Sehoon family: Ha - given: Jie family: Tan - given: Tingnan family: Zhang editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1291-1302 id: yu22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1291 lastpage: 1302 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation' abstract: 'We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot’s observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot’s observation space. To scalably learn this grounding we propose to leverage offline pre-collected robotic datasets (including highly sub-optimal, autonomously-collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform a range of visuomotor tasks from natural language, such as “open the right drawer” and “move the stapler”, on a Franka Emika Panda robot.' volume: 164 URL: https://proceedings.mlr.press/v164/nair22a.html PDF: https://proceedings.mlr.press/v164/nair22a/nair22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-nair22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Suraj family: Nair - given: Eric family: Mitchell - given: Kevin family: Chen - given: brian family: ichter - given: Silvio family: Savarese - given: Chelsea family: Finn editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1303-1315 id: nair22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1303 lastpage: 1315 published: 2022-01-11 00:00:00 +0000 - title: 'Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots' abstract: 'Deep reinforcement learning algorithms require large and diverse datasets in order to learn successful policies for perception-based mobile navigation. However, gathering such datasets with a single robot can be prohibitively expensive. Collecting data with multiple different robotic platforms with possibly different dynamics is a more scalable approach to large-scale data collection. But how can deep reinforcement learning algorithms leverage such heterogeneous datasets? In this work, we propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt). At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. This method of planning with hierarchically integrated models allows the algorithm to train on datasets gathered by a variety of different platforms, while respecting the physical capabilities of the deployment robot at test time. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.' volume: 164 URL: https://proceedings.mlr.press/v164/kang22a.html PDF: https://proceedings.mlr.press/v164/kang22a/kang22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Katie family: Kang - given: Gregory family: Kahn - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1316-1325 id: kang22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1316 lastpage: 1325 published: 2022-01-11 00:00:00 +0000 - title: 'Learning A Risk-Aware Trajectory Planner From Demonstrations Using Logic Monitor' abstract: 'Risk awareness is an important factor to consider when deploying policies on robots in the real-world. Defining the right set of risk metrics can be difficult. In this work, we use a differentiable logic monitor that keeps track of the environmental agents’ behaviors and provides a risk metric that the controlled agent can incorporate during planning. We introduce LogicRiskNet, a learning structure that can be constructed from temporal logic formulas describing rules governing a safe agent’s behaviors. The network’s parameters can be learned from demonstration data. By using temporal logic, the network provides an interpretable architecture that can explain what risk metrics are important to the human. We integrate LogicRiskNet in an inverse optimal control (IOC) framework and show that we can learn to generate trajectory plans that accurately mimic the expert’s risk handling behaviors solely from demonstration data. We evaluate our method on a real-world driving dataset. ' volume: 164 URL: https://proceedings.mlr.press/v164/li22c.html PDF: https://proceedings.mlr.press/v164/li22c/li22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-li22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xiao family: Li - given: Jonathan family: DeCastro - given: Cristian Ioan family: Vasile - given: Sertac family: Karaman - given: Daniela family: Rus editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1326-1335 id: li22c issued: date-parts: - 2022 - 1 - 11 firstpage: 1326 lastpage: 1335 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Eye-in-Hand Camera Calibration from a Single Image' abstract: 'Eye-in-hand camera calibration is a fundamental and long-studied problem in robotics. We present a study on using learning-based methods for solving this problem online from a single RGB image, whilst training our models with entirely synthetic data. We study three main approaches: one direct regression model that directly predicts the extrinsic matrix from an image, one sparse correspondence model that regresses 2D keypoints and then uses PnP, and one dense correspondence model that uses regressed depth and segmentation maps to enable ICP pose estimation. In our experiments, we benchmark these methods against each other and against well-established classical methods, to find the surprising result that direct regression outperforms other approaches, and we perform noise-sensitivity analysis to gain further insights into these results.' volume: 164 URL: https://proceedings.mlr.press/v164/valassakis22a.html PDF: https://proceedings.mlr.press/v164/valassakis22a/valassakis22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-valassakis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Eugene family: Valassakis - given: Kamil family: Dreczkowski - given: Edward family: Johns editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1336-1346 id: valassakis22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1336 lastpage: 1346 published: 2022-01-11 00:00:00 +0000 - title: 'LENS: Localization enhanced by NeRF synthesis' abstract: 'Neural Radiance Fields (NeRF) have recently demonstrated photorealistic results for the task of novel view synthesis. In this paper, we propose to apply novel view synthesis to the robot relocalization problem: we demonstrate improvement of camera pose regression thanks to an additional synthetic dataset rendered by the NeRF class of algorithm. To avoid spawning novel views in irrelevant places we selected virtual camera locations from NeRF internal representation of the 3D geometry of the scene. We further improved localization accuracy of pose regressors using synthesized realistic and geometry consistent images as data augmentation during training. At the time of publication, our approach improved state of the art with a 60% lower error on Cambridge Landmarks and 7-scenes datasets. Hence, the resulting accuracy becomes comparable to structure-based methods, without any architecture modification or domain adaptation constraints. Since our method allows almost infinite generation of training data, we investigated limitations of camera pose regression depending on size and distribution of data used for training on public benchmarks. We concluded that pose regression accuracy is mostly bounded by relatively small and biased datasets rather than capacity of the pose regression model to solve the localization task.' volume: 164 URL: https://proceedings.mlr.press/v164/moreau22a.html PDF: https://proceedings.mlr.press/v164/moreau22a/moreau22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-moreau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Arthur family: Moreau - given: Nathan family: Piasco - given: Dzmitry family: Tsishkou - given: Bogdan family: Stanciulescu - given: Arnaud de La family: Fortelle editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1347-1356 id: moreau22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1347 lastpage: 1356 published: 2022-01-11 00:00:00 +0000 - title: 'Robot Reinforcement Learning on the Constraint Manifold' abstract: 'Reinforcement learning in robotics is extremely challenging due to many practical issues, including safety, mechanical constraints, and wear and tear. Typically, these issues are not considered in the machine learning literature. One crucial problem in applying reinforcement learning in the real world is Safe Exploration, which requires physical and safety constraints satisfaction throughout the learning process. To explore in such a safety-critical environment, leveraging known information such as robot models and constraints is beneficial to provide more robust safety guarantees. Exploiting this knowledge, we propose a novel method to learn robotics tasks in simulation efficiently while satisfying the constraints during the learning process.' volume: 164 URL: https://proceedings.mlr.press/v164/liu22c.html PDF: https://proceedings.mlr.press/v164/liu22c/liu22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-liu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Puze family: Liu - given: Davide family: Tateo - given: Haitham Bou family: Ammar - given: Jan family: Peters editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1357-1366 id: liu22c issued: date-parts: - 2022 - 1 - 11 firstpage: 1357 lastpage: 1366 published: 2022-01-11 00:00:00 +0000 - title: 'Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation' abstract: 'In mobile manipulation (MM), robots can both navigate within and interact with their environment and are thus able to complete many more tasks than robots only capable of navigation or manipulation. In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks. Much prior work has shown that IL can train visuo-motor policies for either manipulation or navigation domains, but few works have applied IL to the MM domain. Doing this is challenging for two reasons: on the data side, current interfaces make collecting high-quality human demonstrations difficult, and on the learning side, policies trained on limited data can suffer from covariate shift when deployed. To address these problems, we first propose Mobile Manipulation RoboTurk (MoMaRT), a novel teleoperation framework allowing simultaneous navigation and manipulation of mobile manipulators, and collect a first-of-its-kind large scale dataset in a realistic simulated kitchen setting. We then propose a learned error detection system to address the covariate shift by detecting when an agent is in a potential failure state. We train performant IL policies and error detectors from this data, and achieve over 45% task success rate and 85% error detection success rate across multiple multi-stage tasks when trained on expert data. Additional results and video at https://sites.google.com/view/il-for-mm/home.' volume: 164 URL: https://proceedings.mlr.press/v164/wong22a.html PDF: https://proceedings.mlr.press/v164/wong22a/wong22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Josiah family: Wong - given: Albert family: Tung - given: Andrey family: Kurenkov - given: Ajay family: Mandlekar - given: Li family: Fei-Fei - given: Silvio family: Savarese - given: Roberto family: Martín-Martín editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1367-1378 id: wong22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1367 lastpage: 1378 published: 2022-01-11 00:00:00 +0000 - title: 'LILA: Language-Informed Latent Actions' abstract: 'We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller – e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down – for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot’s end-effector to the bowl, and the other dimension controls the robot’s end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.' volume: 164 URL: https://proceedings.mlr.press/v164/karamcheti22a.html PDF: https://proceedings.mlr.press/v164/karamcheti22a/karamcheti22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-karamcheti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Siddharth family: Karamcheti - given: Megha family: Srivastava - given: Percy family: Liang - given: Dorsa family: Sadigh editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1379-1390 id: karamcheti22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1379 lastpage: 1390 published: 2022-01-11 00:00:00 +0000 - title: 'CLASP: Constrained Latent Shape Projection for Refining Object Shape from Robot Contact' abstract: 'Robots need both visual and contact sensing to effectively estimate the state of their environment. Camera RGBD data provides rich information of the objects surrounding the robot, and shape priors can help correct noise and fill in gaps and occluded regions. However, when the robot senses unexpected contact, the estimate should be updated to explain the contact. To address this need, we propose CLASP: Constrained Latent Shape Projection. This approach consists of a shape completion network that generates a prior from RGBD data and a procedure to generate shapes consistent with both the network prior and robot contact observations. We find CLASP consistently decreases the Chamfer Distance between the predicted and ground truth scenes, while other approaches do not benefit from contact information.' volume: 164 URL: https://proceedings.mlr.press/v164/saund22a.html PDF: https://proceedings.mlr.press/v164/saund22a/saund22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-saund22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Brad family: Saund - given: Dmitry family: Berenson editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1391-1400 id: saund22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1391 lastpage: 1400 published: 2022-01-11 00:00:00 +0000 - title: 'Learn2Assemble with Structured Representations and Search for Robotic Architectural Construction' abstract: 'Autonomous robotic assembly requires a well-orchestrated sequence of high-level actions and smooth manipulation executions. Learning to assemble complex 3D structures remains a challenging problem that requires drawing connections between target designs and building blocks, and creating valid assembly sequences considering structural stability and feasibility. To address the combinatorial complexity of the assembly tasks, we propose a multi-head attention graph representation that can be trained with reinforcement learning (RL) to encode the spatial relations and provide meaningful assembly actions. Combining structured representations with model-free RL and Monte-Carlo planning allows agents to operate with various target shapes and building block types. We design a hierarchical control framework that learns to sequence the building blocks to construct arbitrary 3D designs and ensures their feasibility, as we plan the geometric execution with the robot-in-the-loop. We demonstrate the flexibility of the proposed structured representation and our algorithmic solution in a series of simulated 3D assembly tasks with robotic evaluation, which showcases our method’s ability to learn to construct stable structures with a large number of building blocks. Code and videos are available at: https://sites.google.com/view/learn2assemble' volume: 164 URL: https://proceedings.mlr.press/v164/funk22a.html PDF: https://proceedings.mlr.press/v164/funk22a/funk22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-funk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Niklas family: Funk - given: Georgia family: Chalvatzaki - given: Boris family: Belousov - given: Jan family: Peters editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1401-1411 id: funk22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1401 lastpage: 1411 published: 2022-01-11 00:00:00 +0000 - title: 'Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations' abstract: 'This paper proposes a correspondence-free method for point cloud rotational registration. We learn an embedding for each point cloud in a feature space that preserves the SO(3)-equivariance property, enabled by recent developments in equivariant neural networks. The proposed shape registration method achieves three major advantages through combining equivariant feature learning with implicit shape models. First, the necessity of data association is removed because of the permutation-invariant property in network architectures similar to PointNet. Second, the registration in feature space can be solved in closed-form using Horn’s method due to the SO(3)-equivariance property. Third, the registration is robust to noise in the point cloud because of the joint training of registration and implicit shape reconstruction. The experimental results show superior performance compared with existing correspondence-free deep registration methods. ' volume: 164 URL: https://proceedings.mlr.press/v164/zhu22b.html PDF: https://proceedings.mlr.press/v164/zhu22b/zhu22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-zhu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Minghan family: Zhu - given: Maani family: Ghaffari - given: Huei family: Peng editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1412-1422 id: zhu22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1412 lastpage: 1422 published: 2022-01-11 00:00:00 +0000 - title: 'Specializing Versatile Skill Libraries using Local Mixture of Experts' abstract: 'A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a contextual motion primitive. However, learning such MoEs is challenging as most objectives force the model to cover the entire context space, which prevents specialization of the primitives resulting in rather low-quality components. Starting from maximum entropy reinforcement learning (RL), we decompose the objective into optimizing an individual lower bound per mixture component. Further, we introduce a curriculum by allowing the components to focus on a local context region, enabling the model to learn highly accurate skill representations. To this end, we use local context distributions that are adapted jointly with the expert primitives. Our lower bound advocates an iterative addition of new components, where new components will concentrate on local context regions not covered by the current MoE. This local and incremental learning results in a modular MoE model of high accuracy and versatility, where both properties can be scaled by adding more components on the fly. We demonstrate this by an extensive ablation and on two challenging simulated robot skill learning tasks. We compare our achieved performance to LaDiPS and HiREPS, a known hierarchical policy search method for learning diverse skills. ' volume: 164 URL: https://proceedings.mlr.press/v164/celik22a.html PDF: https://proceedings.mlr.press/v164/celik22a/celik22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-celik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Onur family: Celik - given: Dongzhuoran family: Zhou - given: Ge family: Li - given: Philipp family: Becker - given: Gerhard family: Neumann editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1423-1433 id: celik22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1423 lastpage: 1433 published: 2022-01-11 00:00:00 +0000 - title: 'Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views' abstract: 'Trajectory prediction of road participants such as vehicles and pedestrians is crucial for autonomous driving. Recently, graph neural network (GNN) is widely adopted to capture the social interactions among the agents. Many GNN-based models formulate the prediction task as a single-agent prediction problem where multiple inference is needed for multi-agent prediction (which is common in practice), which leads to fundamental inconsistency in terms of homotopy as well as inefficiency for the memory and time. Moreover, even for models that do perform joint prediction, typically one centric agent is selected and all other agents’ information is normalized based on that. Such centric-only normalization leads to asymmetric encoding of different agents in GNN, which might harm its performance. In this work, we propose a efficient multi-agent prediction framework that can predict all agents’ trajectories jointly by normalizing and processing all agents’ information symmetrically and homogeneously with combined egocentirc and allocentric views. Experiments are conducted on two interaction-rich behavior datasets: INTERACTION (vehicles) and TrajNet++ (pedestrian). The results show that the proposed framework can significantly boost the inference speed of the GNN-based model for multi-agent prediction and achieve better performance. In the INTERACTION dataset’s challenge, the proposed model achieved the 1st place in the regular track and generalization track.' volume: 164 URL: https://proceedings.mlr.press/v164/jia22a.html PDF: https://proceedings.mlr.press/v164/jia22a/jia22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-jia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xiaosong family: Jia - given: Liting family: Sun - given: Hang family: Zhao - given: Masayoshi family: Tomizuka - given: Wei family: Zhan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1434-1443 id: jia22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1434 lastpage: 1443 published: 2022-01-11 00:00:00 +0000 - title: 'Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks' abstract: 'Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems to realize foresighted state estimation, collision avoidance, and planning. In this paper, we address the problem of predicting future 3D LiDAR point clouds given a sequence of past LiDAR scans. Estimating the future scene on the sensor level does not require any preceding steps as in localization or tracking systems and can be trained self-supervised. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan and concatenates a sequence of range images to obtain a 3D tensor. Based on such tensors, we develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene and to predict the future 3D point clouds. We evaluate our method on multiple datasets and the experimental results suggest that our method outperforms existing point cloud prediction architectures and generalizes well to new, unseen environments without additional fine-tuning. Our method operates online and is faster than the common LiDAR frame rate of 10 Hz.' volume: 164 URL: https://proceedings.mlr.press/v164/mersch22a.html PDF: https://proceedings.mlr.press/v164/mersch22a/mersch22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mersch22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Benedikt family: Mersch - given: Xieyuanli family: Chen - given: Jens family: Behley - given: Cyrill family: Stachniss editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1444-1454 id: mersch22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1444 lastpage: 1454 published: 2022-01-11 00:00:00 +0000 - title: 'Dealing with the Unknown: Pessimistic Offline Reinforcement Learning' abstract: 'Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.' volume: 164 URL: https://proceedings.mlr.press/v164/li22d.html PDF: https://proceedings.mlr.press/v164/li22d/li22d.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-li22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jinning family: Li - given: Chen family: Tang - given: Masayoshi family: Tomizuka - given: Wei family: Zhan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1455-1464 id: li22d issued: date-parts: - 2022 - 1 - 11 firstpage: 1455 lastpage: 1464 published: 2022-01-11 00:00:00 +0000 - title: 'Stochastic Policy Optimization with Heuristic Information for Robot Learning' abstract: 'Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.' volume: 164 URL: https://proceedings.mlr.press/v164/kim22a.html PDF: https://proceedings.mlr.press/v164/kim22a/kim22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kim22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Seonghyun family: Kim - given: Ingook family: Jang - given: Samyeul family: Noh - given: Hyunseok family: Kim editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1465-1474 id: kim22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1465 lastpage: 1474 published: 2022-01-11 00:00:00 +0000 - title: 'Probabilistic and Geometric Depth: Detecting Objects in Perspective' abstract: '3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, a representative general setting among image-based approaches, provides a more economical solution than conventional settings relying on LiDARs but still yields unsatisfactory results. This paper first presents a systematic study on this problem. We observe that the current monocular 3D detection can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. Moreover, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects. To this end, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method, PGD, obtains significant improvements on KITTI and nuScenes benchmarks, achieving 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.' volume: 164 URL: https://proceedings.mlr.press/v164/wang22i.html PDF: https://proceedings.mlr.press/v164/wang22i/wang22i.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Tai family: Wang - given: Xinge family: ZHU - given: Jiangmiao family: Pang - given: Dahua family: Lin editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1475-1485 id: wang22i issued: date-parts: - 2022 - 1 - 11 firstpage: 1475 lastpage: 1485 published: 2022-01-11 00:00:00 +0000 - title: 'Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions' abstract: ' Enabling human operators to interact with robotic agents using natural language would allow non-experts to intuitively instruct these agents. Towards this goal, we propose a novel Transformer-based model which enables a user to guide a robot arm through a 3D multi-step manipulation task with natural language commands. Our system maps images and commands to masks over grasp or place locations, grounding the language directly in perceptual space. In a suite of block rearrangement tasks, we show that these masks can be combined with an existing manipulation framework without re-training, greatly improving learning efficiency. Our masking model is several orders of magnitude more sample efficient than typical Transformer models, operating with hundreds, not millions, of examples. Our modular design allows us to leverage supervised and reinforcement learning, providing an easy interface for experimentation with different architectures. Our model completes block manipulation tasks with synthetic commands $530%$ more often than a UNet-based baseline, and learns to localize actions correctly while creating a mapping of symbols to perceptual input that supports compositional reasoning. We provide a valuable resource for 3D manipulation instruction following research by porting an existing 3D block dataset with crowdsourced language to a simulated environment. Our method’s $25.3%$ absolute improvement in identifying the correct block on the ported dataset demonstrates its ability to handle syntactic and lexical variation. ' volume: 164 URL: https://proceedings.mlr.press/v164/stengel-eskin22a.html PDF: https://proceedings.mlr.press/v164/stengel-eskin22a/stengel-eskin22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-stengel-eskin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Elias family: Stengel-Eskin - given: Andrew family: Hundt - given: Zhuohong family: He - given: Aditya family: Murali - given: Nakul family: Gopalan - given: Matthew family: Gombolay - given: Gregory family: Hager editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1486-1501 id: stengel-eskin22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1486 lastpage: 1501 published: 2022-01-11 00:00:00 +0000 - title: 'Towards Real Robot Learning in the Wild: A Case Study in Bipedal Locomotion' abstract: 'Algorithms for self-learning systems have made considerable progress in recent years, yet safety concerns and the need for additional instrumentation have so far largely limited learning experiments with real robots to well controlled lab settings. In this paper, we demonstrate how a small bipedal robot can autonomously learn to walk with minimal human intervention and with minimal instrumentation of the environment. We employ data-efficient off-policy deep reinforcement learning to learn to walk end-to-end, directly on hardware, using rewards that are computed exclusively from proprioceptive sensing. To allow the robot to autonomously adapt its behaviour to its environment, we additionally provide the agent with raw RGB camera images as input. By deploying two robots in different geographic locations while sharing data in a distributed learning setup, we achieve higher throughput and greater diversity of the training data. Our learning experiments constitute a step towards the long-term vision of learning “in the wild” for legged robots, and, to our knowledge, represent the first demonstration of learning a deep neural network controller for bipedal locomotion directly on hardware.' volume: 164 URL: https://proceedings.mlr.press/v164/bloesch22a.html PDF: https://proceedings.mlr.press/v164/bloesch22a/bloesch22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-bloesch22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Michael family: Bloesch - given: Jan family: Humplik - given: Viorica family: Patraucean - given: Roland family: Hafner - given: Tuomas family: Haarnoja - given: Arunkumar family: Byravan - given: Noah Yamamoto family: Siegel - given: Saran family: Tunyasuvunakool - given: Federico family: Casarini - given: Nathan family: Batchelor - given: Francesco family: Romano - given: Stefano family: Saliceti - given: Martin family: Riedmiller - given: S. M. Ali family: Eslami - given: Nicolas family: Heess editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1502-1511 id: bloesch22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1502 lastpage: 1511 published: 2022-01-11 00:00:00 +0000 - title: 'SCAPE: Learning Stiffness Control from Augmented Position Control Experiences' abstract: 'We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to uncertainties. Most current reinforcement learning approaches to achieve robotic manipulation have exclusively focused on position control, often due to the difficulty of learning high-dimensional stiffness control policies. This difficulty can be partially mitigated via policy guidance such as imitation learning. However, expert stiffness control demonstrations are often expensive or infeasible to record. Therefore, we present an approach to learn Stiffness Control from Augmented Position control Experiences (SCAPE) that bypasses this difficulty by transforming position control demonstrations into approximate, suboptimal stiffness control demonstrations. Then, the suboptimality of the augmented demonstrations is addressed by using complementary techniques that help the agent safely learn from both the demonstrations and reinforcement learning. By using simulation tools and experiments on a robotic testbed, we show that the proposed approach efficiently learns safe manipulation policies and outperforms learned position control policies and several other baseline learning algorithms.' volume: 164 URL: https://proceedings.mlr.press/v164/kim22b.html PDF: https://proceedings.mlr.press/v164/kim22b/kim22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-kim22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Mincheol family: Kim - given: Scott family: Niekum - given: Ashish D. family: Deshpande editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1512-1521 id: kim22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1512 lastpage: 1521 published: 2022-01-11 00:00:00 +0000 - title: 'Fully Self-Supervised Class Awareness in Dense Object Descriptors' abstract: 'We address the problem of inferring self-supervised dense semantic correspondences between objects in multi-object scenes. The method introduces learning of class-aware dense object descriptors by providing either unsupervised discrete labels or confidence in object similarities. We quantitatively and qualitatively show that the introduced method outperforms previous techniques with more robust pixel-to-pixel matches. An example robotic application is also shown - grasping of objects in clutter based on corresponding points. ' volume: 164 URL: https://proceedings.mlr.press/v164/hadjivelichkov22a.html PDF: https://proceedings.mlr.press/v164/hadjivelichkov22a/hadjivelichkov22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-hadjivelichkov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Denis family: Hadjivelichkov - given: Dimitrios family: Kanoulas editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1522-1531 id: hadjivelichkov22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1522 lastpage: 1531 published: 2022-01-11 00:00:00 +0000 - title: 'Neural Posterior Domain Randomization' abstract: 'Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.' volume: 164 URL: https://proceedings.mlr.press/v164/muratore22a.html PDF: https://proceedings.mlr.press/v164/muratore22a/muratore22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-muratore22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Fabio family: Muratore - given: Theo family: Gruner - given: Florian family: Wiese - given: Boris family: Belousov - given: Michael family: Gienger - given: Jan family: Peters editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1532-1542 id: muratore22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1532 lastpage: 1542 published: 2022-01-11 00:00:00 +0000 - title: 'You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL' abstract: 'The goal of offline reinforcement learning (RL) is to find an optimal policy given prerecorded trajectories. Many current approaches customize existing off-policy RL algorithms, especially actor-critic algorithms in which policy evaluation and improvement are iterated. However, the convergence of such approaches is not guaranteed due to the use of complex non-linear function approximation and an intertwined optimization process. By contrast, we propose a simple baseline algorithm for offline RL that only performs the policy evaluation step once so that the algorithm does not require complex stabilization schemes. Since the proposed algorithm is not likely to converge to an optimal policy, it is an appropriate baseline for actor-critic algorithms that ought to be outperformed if there is indeed value in iterative optimization in the offline setting. Surprisingly, we empirically find that the proposed algorithm exhibits competitive and sometimes even state-of-the-art performance in a subset of the D4RL offline RL benchmark. This result suggests that future work is needed to fully exploit the potential advantages of iterative optimization in order to justify the reduced stability of such methods.' volume: 164 URL: https://proceedings.mlr.press/v164/goo22a.html PDF: https://proceedings.mlr.press/v164/goo22a/goo22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-goo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Wonjoon family: Goo - given: Scott family: Niekum editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1543-1553 id: goo22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1543 lastpage: 1553 published: 2022-01-11 00:00:00 +0000 - title: 'Safe Driving via Expert Guided Policy Optimization' abstract: 'When learning common skills like driving, beginners usually have domain experts standing by to ensure the safety of the learning process. We formulate such learning scheme under the Expert-in-the-loop Reinforcement Learning where a guardian is introduced to safeguard the exploration of the learning agent. While allowing the sufficient exploration in the uncertain environment, the guardian intervenes under dangerous situations and demonstrates the correct actions to avoid potential accidents. Thus ERL enables both exploration and expert’s partial demonstration as two training sources. Following such a setting, we develop a novel Expert Guided Policy Optimization (EGPO) method which integrates the guardian in the loop of reinforcement learning. The guardian is composed of an expert policy to generate demonstration and a switch function to decide when to intervene. Particularly, a constrained optimization technique is used to tackle the trivial solution that the agent deliberately behaves dangerously to deceive the expert into taking over. Offline RL technique is further used to learn from the partial demonstration generated by the expert. Safe driving experiments show that our method achieves superior training and test-time safety, outperforms baselines with a substantial margin in sample efficiency, and preserves the generalizabiliy to unseen environments in test-time. Demo video and source code are available at: https://decisionforce.github.io/EGPO/' volume: 164 URL: https://proceedings.mlr.press/v164/peng22a.html PDF: https://proceedings.mlr.press/v164/peng22a/peng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-peng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Zhenghao family: Peng - given: Quanyi family: Li - given: Chunxiao family: Liu - given: Bolei family: Zhou editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1554-1563 id: peng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1554 lastpage: 1563 published: 2022-01-11 00:00:00 +0000 - title: '"Good Robot! Now Watch This!": Repurposing Reinforcement Learning for Task-to-Task Transfer' abstract: 'Modern Reinforcement Learning (RL) algorithms are not sample efficient to train on multi-step tasks in complex domains, impeding their wider deployment in the real world. We address this problem by leveraging the insight that RL models trained to complete one set of tasks can be repurposed to complete related tasks when given just a handful of demonstrations. Based upon this insight, we propose See-SPOT-Run (SSR), a new computational approach to robot learning that enables a robot to complete a variety of real robot tasks in novel problem domains without task-specific training. SSR uses pretrained RL models to create vectors that represent model, task, and action relevance in demonstration and test scenes. SSR then compares these vectors via our Cycle Consistency Distance (CCD) metric to determine the next action to take. SSR completes 58% more task steps and 20% more trials than a baseline few-shot learning method that requires task-specific training. SSR also achieves a four order of magnitude improvement in compute efficiency and a 20% to three order of magnitude improvement in sample efficiency compared to the baseline and to training RL models from scratch. To our knowledge, we are the first to address multi-step tasks from demonstration on a real robot without task-specific training, where both the visual input and action space output are high dimensional. Code is available in the supplement.' volume: 164 URL: https://proceedings.mlr.press/v164/hundt22a.html PDF: https://proceedings.mlr.press/v164/hundt22a/hundt22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-hundt22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Andrew family: Hundt - given: Aditya family: Murali - given: Priyanka family: Hubli - given: Ran family: Liu - given: Nakul family: Gopalan - given: Matthew family: Gombolay - given: Gregory D. family: Hager editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1564-1574 id: hundt22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1564 lastpage: 1574 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Inertial Odometry for Dynamic Legged Robot State Estimation' abstract: 'This paper introduces a novel proprioceptive state estimator for legged robots based on a learned displacement measurement from IMU data. Recent research in pedestrian tracking has shown that motion can be inferred from inertial data using convolutional neural networks. A learned inertial displacement measurement can improve state estimation in challenging scenarios where leg odometry is unreliable, such as slipping and compressible terrains. Our work learns to estimate a displacement measurement from IMU data which is then fused with traditional leg odometry. Our approach greatly reduces the drift of proprioceptive state estimation, which is critical for legged robots deployed in vision and lidar denied environments such as foggy sewers or dusty mines. We compared results from an EKF and an incremental fixed-lag factor graph estimator using data from several real robot experiments crossing challenging terrains. Our results show a reduction of relative pose error by 37% in challenging scenarios when compared to a traditional kinematic-inertial estimator without learned measurement. We also demonstrate a 22% reduction in error when used with vision systems in visually degraded environments such as an underground mine.' volume: 164 URL: https://proceedings.mlr.press/v164/buchanan22a.html PDF: https://proceedings.mlr.press/v164/buchanan22a/buchanan22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-buchanan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Russell family: Buchanan - given: Marco family: Camurri - given: Frank family: Dellaert - given: Maurice family: Fallon editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1575-1584 id: buchanan22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1575 lastpage: 1584 published: 2022-01-11 00:00:00 +0000 - title: 'Embodied Semantic Scene Graph Generation' abstract: 'Semantic scene graph provides an effective way for intelligent agents to better understand the environment and it has been extensively used in many robotic applications. Existing work mainly focuses on generating the scene graph from the sensory information collected from a pre-defined path, while the environment should be exhaustively explored with a carefully designed path in order to obtain a comprehensive semantic scene graph efficiently. In this paper, we propose a new task of Embodied Semantic Scene Graph Generation, which exploits the embodiment of the intelligent agent to autonomously generate an appropriate path to explore the environment for scene graph generation. To this end, a learning framework with the paradigms of imitation learning and reinforcement learning is proposed to help the agent generate proper actions to explore the environment and the scene graph is incrementally constructed. The proposed method is evaluated on the AI2Thor environment using both the quantitative and qualitative performance indexes. Additionally, we implement the proposed method on a streaming video captioning task and promising experimental results are achieved.' volume: 164 URL: https://proceedings.mlr.press/v164/li22e.html PDF: https://proceedings.mlr.press/v164/li22e/li22e.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-li22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Xinghang family: Li - given: Di family: Guo - given: Huaping family: Liu - given: Fuchun family: Sun editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1585-1594 id: li22e issued: date-parts: - 2022 - 1 - 11 firstpage: 1585 lastpage: 1594 published: 2022-01-11 00:00:00 +0000 - title: 'Generalised Task Planning with First-Order Function Approximation' abstract: 'Real world robotics often operates in uncertain and dynamic environments where generalisation over different scenarios is of practical interest. In the absence of a model, value-based reinforcement learning can be used to learn a goal-directed policy. Typically, the interaction between robots and the objects in the environment exhibit a first-order structure. We introduce first-order, or relational, features to represent an approximation of the Q-function so that it can induce a generalised policy. Empirical results for a service robot domain show that our online relational reinforcement learning method is scalable to large scale problems and enables transfer learning between different problems and simulation environments with dissimilar transition dynamics.' volume: 164 URL: https://proceedings.mlr.press/v164/ng22a.html PDF: https://proceedings.mlr.press/v164/ng22a/ng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-ng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jun Hao Alvin family: Ng - given: Ronald P.A. family: Petrick editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1595-1610 id: ng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1595 lastpage: 1610 published: 2022-01-11 00:00:00 +0000 - title: 'Distributional Depth-Based Estimation of Object Articulation Models' abstract: 'We propose a method that efficiently learns distributions over articulation models directly from depth images without the need to know articulation model categories a priori. By contrast, existing methods that learn articulation models from raw observations require objects to be textured, and most only predict point estimates of the model parameters. Our core contributions include a novel representation for distributions over rigid body transformations and articulation model parameters based on Screw theory, von Mises-Fisher distributions and Stiefel manifolds. Combining these concepts allows for an efficient, mathematically sound representation that inherently satisfies several constraints that rigid body transformations and articulations must adhere to. In addition, we introduce a novel deep-learning based approach, DUST-net, that efficiently learns such distributions and, hence, performs category-independent articulation model estimation while also providing model uncertainties. We evaluate our approach on two benchmarking datasets and three real-world objects and compare its performance with two current state-of-the-art methods. Our results demonstrate that DUST-net can successfully learn distributions over articulation models for novel objects across articulation model categories, which generate point estimates with better accuracy than state-of-the-art methods and effectively capture the uncertainty over predicted model parameters due to noisy inputs. [webpage]' volume: 164 URL: https://proceedings.mlr.press/v164/jain22a.html PDF: https://proceedings.mlr.press/v164/jain22a/jain22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-jain22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ajinkya family: Jain - given: Stephen family: Giguere - given: Rudolf family: Lioutikov - given: Scott family: Niekum editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1611-1621 id: jain22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1611 lastpage: 1621 published: 2022-01-11 00:00:00 +0000 - title: 'Learning Off-Policy with Online Planning' abstract: 'Reinforcement learning (RL) in low-data and risk-sensitive domains requires performant and flexible deployment policies that can readily incorporate constraints during deployment. One such class of policies are the semi-parametric H-step lookahead policies, which select actions using trajectory optimization over a dynamics model for a fixed horizon with a terminal value function. In this work, we investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function learned by a model-free off-policy algorithm, named Learning Off-Policy with Online Planning (LOOP). We provide a theoretical analysis of this method, suggesting a tradeoff between model errors and value function errors, and empirically demonstrate this tradeoff to be beneficial in deep reinforcement learning. Furthermore, we identify the "Actor Divergence" issue in this framework and propose Actor Regularized Control (ARC), a modified trajectory optimization procedure. We evaluate our method on a set of robotic tasks for Offline and Online RL and demonstrate improved performance. We also show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments. We demonstrate that LOOP is a desirable framework for robotics applications based on its strong performance in various important RL settings. Project video and details can be found at hari-sikchi.github.io/loop.' volume: 164 URL: https://proceedings.mlr.press/v164/sikchi22a.html PDF: https://proceedings.mlr.press/v164/sikchi22a/sikchi22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-sikchi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Harshit family: Sikchi - given: Wenxuan family: Zhou - given: David family: Held editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1622-1633 id: sikchi22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1622 lastpage: 1633 published: 2022-01-11 00:00:00 +0000 - title: 'Smooth Exploration for Robotic Reinforcement Learning' abstract: 'Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE enables a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance.' volume: 164 URL: https://proceedings.mlr.press/v164/raffin22a.html PDF: https://proceedings.mlr.press/v164/raffin22a/raffin22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-raffin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Antonin family: Raffin - given: Jens family: Kober - given: Freek family: Stulp editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1634-1644 id: raffin22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1634 lastpage: 1644 published: 2022-01-11 00:00:00 +0000 - title: 'Tactile Sim-to-Real Policy Transfer via Real-to-Sim Image Translation' abstract: 'Simulation has recently become key for deep reinforcement learning to safely and efficiently acquire general and complex control policies from visual and proprioceptive inputs. Tactile information is not usually considered despite its direct relation to environment interaction. In this work, we present a suite of simulated environments tailored towards tactile robotics and reinforcement learning. A simple and fast method of simulating optical tactile sensors is provided, where high-resolution contact geometry is represented as depth images. Proximal Policy Optimisation (PPO) is used to learn successful policies across all considered tasks. A data-driven approach enables translation of the current state of a real tactile sensor to corresponding simulated depth images. This policy is implemented within a real-time control loop on a physical robot to demonstrate zero-shot sim-to-real policy transfer on several physically-interactive tasks requiring a sense of touch. ' volume: 164 URL: https://proceedings.mlr.press/v164/church22a.html PDF: https://proceedings.mlr.press/v164/church22a/church22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-church22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Alex family: Church - given: John family: Lloyd - given: raia family: hadsell - given: Nathan F. family: Lepora editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1645-1654 id: church22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1645 lastpage: 1654 published: 2022-01-11 00:00:00 +0000 - title: 'RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks' abstract: 'Segmenting unseen object instances in cluttered environments is an important capability that robots need when functioning in unstructured environments. While previous methods have exhibited promising results, they still tend to provide incorrect results in highly cluttered scenes. We postulate that a network architecture that encodes relations between objects at a high-level can be beneficial. Thus, in this work, we propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks. We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the perturbed segmentations. Our proposed method is orthogonal to previous works and achieves state-of-the-art performance when combined with them. We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes. Code, models, and video can be found at https://github.com/chrisdxie/rice.' volume: 164 URL: https://proceedings.mlr.press/v164/xie22a.html PDF: https://proceedings.mlr.press/v164/xie22a/xie22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-xie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chris family: Xie - given: Arsalan family: Mousavian - given: Yu family: Xiang - given: Dieter family: Fox editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1655-1665 id: xie22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1655 lastpage: 1665 published: 2022-01-11 00:00:00 +0000 - title: 'O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning' abstract: 'Contrary to the vast literature in modeling, perceiving, and understanding agent-object (e.g., human-object, hand-object, robot-object) interaction in computer vision and robotics, very few past works have studied the task of object-object interaction, which also plays an important role in robotic manipulation and planning tasks. There is a rich space of object-object interaction scenarios in our daily life, such as placing an object on a messy tabletop, fitting an object inside a drawer, pushing an object using a tool, etc. In this paper, we propose a unified affordance learning framework to learn object-object interaction for various tasks. By constructing four object-object interaction task environments using physical simulation (SAPIEN) and thousands of ShapeNet models with rich geometric diversity, we are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations. At the core of technical contribution, we propose an object-kernel point convolution network to reason about detailed interaction between two objects. Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach.' volume: 164 URL: https://proceedings.mlr.press/v164/mo22b.html PDF: https://proceedings.mlr.press/v164/mo22b/mo22b.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mo22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Kaichun family: Mo - given: Yuzhe family: Qin - given: Fanbo family: Xiang - given: Hao family: Su - given: Leonidas family: Guibas editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1666-1677 id: mo22b issued: date-parts: - 2022 - 1 - 11 firstpage: 1666 lastpage: 1677 published: 2022-01-11 00:00:00 +0000 - title: 'What Matters in Learning from Offline Human Demonstrations for Robot Manipulation' abstract: 'Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learning algorithms for robot manipulation on five simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality. Our study analyzes the most critical challenges when learning from offline human data for manipulation. Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation. We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods, and the ability to easily scale to natural, real-world manipulation scenarios where only raw sensory signals are available. We have open-sourced our datasets and all algorithm implementations to facilitate future research and fair comparisons in learning from human demonstration data at https://arise-initiative.github.io/robomimic-web/' volume: 164 URL: https://proceedings.mlr.press/v164/mandlekar22a.html PDF: https://proceedings.mlr.press/v164/mandlekar22a/mandlekar22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-mandlekar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Ajay family: Mandlekar - given: Danfei family: Xu - given: Josiah family: Wong - given: Soroush family: Nasiriany - given: Chen family: Wang - given: Rohun family: Kulkarni - given: Li family: Fei-Fei - given: Silvio family: Savarese - given: Yuke family: Zhu - given: Roberto family: Martín-Martín editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1678-1690 id: mandlekar22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1678 lastpage: 1690 published: 2022-01-11 00:00:00 +0000 - title: 'Language Grounding with 3D Objects' abstract: 'Seemingly simple natural language requests to a robot are generally underspecified, for example "Can you bring me the wireless mouse?" Flat images of candidate mice may not provide the discriminative information needed for "wireless." The world, and objects in it, are not flat images but complex 3D shapes. If a human requests an object based on any of its basic properties, such as color, shape, or texture, robots should perform the necessary exploration to accomplish the task. In particular, while substantial effort and progress has been made on understanding explicitly visual attributes like color and category, comparatively little progress has been made on understanding language about shapes and contours. In this work, we introduce a novel reasoning task that targets both visual and non-visual language about 3D objects. Our new benchmark ShapeNet Annotated with Referring Expressions (SNARE) requires a model to choose which of two objects is being referenced by a natural language description. We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects – properties which play a key role in manipulation. We find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform, but note that a large gap remains between these models and human performance.' volume: 164 URL: https://proceedings.mlr.press/v164/thomason22a.html PDF: https://proceedings.mlr.press/v164/thomason22a/thomason22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-thomason22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Jesse family: Thomason - given: Mohit family: Shridhar - given: Yonatan family: Bisk - given: Chris family: Paxton - given: Luke family: Zettlemoyer editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1691-1701 id: thomason22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1691 lastpage: 1701 published: 2022-01-11 00:00:00 +0000 - title: 'Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble' abstract: 'Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfamiliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine-tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.' volume: 164 URL: https://proceedings.mlr.press/v164/lee22d.html PDF: https://proceedings.mlr.press/v164/lee22d/lee22d.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lee22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Seunghyun family: Lee - given: Younggyo family: Seo - given: Kimin family: Lee - given: Pieter family: Abbeel - given: Jinwoo family: Shin editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1702-1712 id: lee22d issued: date-parts: - 2022 - 1 - 11 firstpage: 1702 lastpage: 1712 published: 2022-01-11 00:00:00 +0000 - title: 'Equivariant $Q$ Learning in Spatial Action Spaces' abstract: 'Recently, a variety of new equivariant neural network model architectures have been proposed that generalize better over rotational and reflectional symmetries than standard models. These models are relevant to robotics because many robotics problems can be expressed in a rotationally symmetric way. This paper focuses on equivariance over a visual state space and a spatial action space – the setting where the robot action space includes a subset of $\rm{SE}(2)$. In this situation, we know a priori that rotations and translations in the state image should result in the same rotations and translations in the spatial action dimensions of the optimal policy. Therefore, we can use equivariant model architectures to make $Q$ learning more sample efficient. This paper identifies when the optimal $Q$ function is equivariant and proposes $Q$ network architectures for this setting. We show experimentally that this approach outperforms standard methods in a set of challenging manipulation problems. ' volume: 164 URL: https://proceedings.mlr.press/v164/wang22j.html PDF: https://proceedings.mlr.press/v164/wang22j/wang22j.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-wang22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Dian family: Wang - given: Robin family: Walters - given: Xupeng family: Zhu - given: Robert family: Platt editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1713-1723 id: wang22j issued: date-parts: - 2022 - 1 - 11 firstpage: 1713 lastpage: 1723 published: 2022-01-11 00:00:00 +0000 - title: 'Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions' abstract: 'Safety and stability are common requirements for robotic control systems; however, designing safe, stable controllers remains difficult for nonlinear and uncertain models. We develop a model-based learning approach to synthesize robust feedback controllers with safety and stability guarantees. We take inspiration from robust convex optimization and Lyapunov theory to define robust control Lyapunov barrier functions that generalize despite model uncertainty. We demonstrate our approach in simulation on problems including car trajectory tracking, nonlinear control with obstacle avoidance, satellite rendezvous with safety constraints, and flight control with a learned ground effect model. Simulation results show that our approach yields controllers that match or exceed the capabilities of robust MPC while reducing computational costs by an order of magnitude. We provide source code at github.com/dawsonc/neural_clbf/.' volume: 164 URL: https://proceedings.mlr.press/v164/dawson22a.html PDF: https://proceedings.mlr.press/v164/dawson22a/dawson22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-dawson22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Charles family: Dawson - given: Zengyi family: Qin - given: Sicun family: Gao - given: Chuchu family: Fan editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1724-1735 id: dawson22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1724 lastpage: 1735 published: 2022-01-11 00:00:00 +0000 - title: 'Collect & Infer - a fresh look at data-efficient Reinforcement Learning' abstract: 'This position paper proposes a fresh look at Reinforcement Learning (RL) from the perspective of data-efficiency. RL has gone through three major stages: pure on-line RL where every data-point is considered only once, RL with a replay buffer where additional learning is done on a portion of the experience, and finally transition memory based RL, where, conceptually, all transitions are stored, and flexibly re-used in every update step. While inferring knowledge from all stored experience has led to a tremendous gain in data-efficiency, the question of how this data is collected has been vastly understudied. We argue that data-efficiency can only be achieved through careful consideration of both aspects. We propose to make this insight explicit via a paradigm that we call ’Collect and Infer’, which explicitly models RL as two separate but interconnected processes, concerned with data collection and knowledge inference respectively. ' volume: 164 URL: https://proceedings.mlr.press/v164/riedmiller22a.html PDF: https://proceedings.mlr.press/v164/riedmiller22a/riedmiller22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-riedmiller22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Martin family: Riedmiller - given: Jost Tobias family: Springenberg - given: Roland family: Hafner - given: Nicolas family: Heess editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1736-1744 id: riedmiller22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1736 lastpage: 1744 published: 2022-01-11 00:00:00 +0000 - title: 'The Task Specification Problem ' abstract: 'Robots are commonly used for several industrial applications and some have made their mark even in households (e.g., the roomba). Undoubtedly these systems are impressive! However, they are very narrow in their functionality and we are not even close to building a robot butler. A central challenge is the ability to work with sensory observations and generalization to novel situations. While we do not prescribe a solution to this problem, we do provide a perspective on a few dominant ideas in robot learning for multi-task learning and generalization. This perspective suggests a counter-intuitive conclusion: the primary challenge in building generalizable robotic systems (e.g., a robot butler) is not in the learning algorithms or the hardware, but in how humans transfer their knowledge to robots. ' volume: 164 URL: https://proceedings.mlr.press/v164/agrawal22a.html PDF: https://proceedings.mlr.press/v164/agrawal22a/agrawal22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-agrawal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Pulkit family: Agrawal editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1745-1751 id: agrawal22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1745 lastpage: 1751 published: 2022-01-11 00:00:00 +0000 - title: 'Understanding the World Through Action' abstract: 'The recent history of machine learning research has taught us that machine learning methods can be most effective when they are provided with very large, high-capacity models, and trained on very large and diverse datasets. This has spurred the community to search for ways to remove any bottlenecks to scale. Often the foremost among such bottlenecks is the need for human effort, including the effort of curating and labeling datasets. As a result, considerable attention in recent years has been devoted to utilizing unlabeled data, which can be collected in vast quantities. However, some of the most widely used methods for training on such unlabeled data themselves require human-designed objective functions that must correlate in some meaningful way to downstream tasks. I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning, using general purpose unsupervised or self-supervised reinforcement learning objectives in concert with offline reinforcement learning methods that can leverage large datasets. I will discuss how such a procedure is more closely aligned with potential downstream tasks, and how it could build on existing techniques that have been developed in recent years.' volume: 164 URL: https://proceedings.mlr.press/v164/levine22a.html PDF: https://proceedings.mlr.press/v164/levine22a/levine22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-levine22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Sergey family: Levine editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1752-1757 id: levine22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1752 lastpage: 1757 published: 2022-01-11 00:00:00 +0000 - title: 'Continuous then discrete: A recommendation for building robotic brains' abstract: 'Modern neural networks have allowed substantial advances in robotics, but these algorithms make implicit assumptions about the discretization of time. In this document we argue that there are benefits to be gained, especially in robotics, by designing learning algorithms that exist in continuous time, as well as state, and only later discretizing the algorithms for implementation on traditional computing models, or mapping them directly onto analog hardware. We survey four arguments to support this approach: That continuum representations provide a unified theory of functions for robotic systems; That many algorithms formulated as temporally continuous demonstrate anytime properties; That we can exploit temporal sparsity to effect energy efficiency in both traditional and analog hardware; and that these algorithms reflect the instantiations of intelligence that have evolved in organisms. Further, we present learning algorithms that are derived from continuous representations. Finally, we discuss robotic precedents for this approach, and conclude with the implications of using continuum representations in robotic systems.' volume: 164 URL: https://proceedings.mlr.press/v164/eliasmith22a.html PDF: https://proceedings.mlr.press/v164/eliasmith22a/eliasmith22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-eliasmith22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chris family: Eliasmith - given: P. Michael family: Furlong editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1758-1763 id: eliasmith22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1758 lastpage: 1763 published: 2022-01-11 00:00:00 +0000 - title: 'Back to Reality for Imitation Learning' abstract: 'Imitation learning, and robot learning in general, emerged due to breakthroughs in machine learning, rather than breakthroughs in robotics. As such, evaluation metrics for robot learning are deeply rooted in those for machine learning, and focus primarily on data efficiency. We believe that a better metric for real-world robot learning is time efficiency, which better models the true cost to humans. This is a call to arms to the robot learning community to develop our own evaluation metrics, tailored towards the long-term goals of real-world robotics.' volume: 164 URL: https://proceedings.mlr.press/v164/johns22a.html PDF: https://proceedings.mlr.press/v164/johns22a/johns22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-johns22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Edward family: Johns editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1764-1768 id: johns22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1764 lastpage: 1768 published: 2022-01-11 00:00:00 +0000 - title: 'Robots on Demand: A Democratized Robotics Research Cloud' abstract: 'Robotics research is slowed by three challenges: building a robotics lab is expensive (few participants), everyone uses different robots (participants’ findings often don’t generalize outside their lab), and there is no internet-scale robotics dataset (no lab has the resources to make many robots do many different tasks to generate data and there is no data in the wild). The solution is to build a “Robotics Research Cloud” consisting of centers filled with remotely operable robots in standardized environments. This would be a valuable resource in pushing forward robot learning as a field by making cutting-edge robotics research broadly accessible, helping the field identify promising new approaches that succeed on agreed benchmarks, and creating a massive real-world robotics dataset similar to those that have revolutionized machine learning for vision and language.' volume: 164 URL: https://proceedings.mlr.press/v164/dean22a.html PDF: https://proceedings.mlr.press/v164/dean22a/dean22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-dean22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Victoria family: Dean - given: Yonadav G family: Shavit - given: Abhinav family: Gupta editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1769-1775 id: dean22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1769 lastpage: 1775 published: 2022-01-11 00:00:00 +0000 - title: 'From Robot Learning To Robot Understanding: Leveraging Causal Graphical Models For Robotics' abstract: 'Causal graphical models have been proposed as a way to efficiently and explicitly reason about novel situations and the likely outcomes of decisions. A key challenge facing widespread implementation of these models in robots is using prior knowledge to hypothesize good candidate causal structures when the relevant environmental features are not known in advance. The tight link between causal reasoning and the ability to intervene in the world suggests that robotics has much to contribute to this challenge and would reap significant benefits from progress.' volume: 164 URL: https://proceedings.mlr.press/v164/stocking22a.html PDF: https://proceedings.mlr.press/v164/stocking22a/stocking22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-stocking22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Kaylene Caswell family: Stocking - given: Alison family: Gopnik - given: Claire family: Tomlin editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1776-1781 id: stocking22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1776 lastpage: 1781 published: 2022-01-11 00:00:00 +0000 - title: 'Learning to be Multimodal : Co-evolving Sensory Modalities and Sensor Properties' abstract: 'Making a single sensory modality precise and robust enough to get human-level performance and autonomy could be very expensive or intractable. Fusing information from multiple sensory modalities is promising – for example, recent works showed benefits from combining vision with haptic sensors or with audio data. Learning-based methods facilitate faster progress in this field by removing the need for manual feature engineering. However, the sensor properties and the choice of sensory modalities is still usually done manually. Our blue-sky view is that we could simulate/emulate sensors with various properties, then infer which properties and combinations of sensors yield the best learning outcomes. This view would incentivize the development of novel, affordable sensors that can make a noticeable impact on the performance, robustness and ease of training classifiers, models and policies for robotics. This would motivate making hardware that provides signals complementary to the existing ones. As a result: we can significantly expand the realm of applicability of the learning-based approaches.' volume: 164 URL: https://proceedings.mlr.press/v164/antonova22a.html PDF: https://proceedings.mlr.press/v164/antonova22a/antonova22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-antonova22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Rika family: Antonova - given: Jeannette family: Bohg editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1782-1788 id: antonova22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1782 lastpage: 1788 published: 2022-01-11 00:00:00 +0000 - title: 'RoboFlow: a Data-centric Workflow Management System for Developing AI-enhanced Robots' abstract: 'We propose RoboFlow, a cloud-based workflow management system orchestrating the pipelines of developing AI-enhanced robots. Unlike most traditional robotic development processes that are essentially process-centric, RoboFlow is data-centric. This striking property makes it especially suitable for developing AI-enhanced robots in which data play a central role. More specifically, RoboFlow models the whole robotic development process into 4 building modules (1. data processing, 2. algorithmic development, 3. back testing and 4. application adaptation) interacting with a centralized data engine. All these building modules are containerized and orchestrated under a unified interfacing framework. Such an architectural design greatly increases the maintainability and re-usability of all the building modules and enables us to develop them in a fully parallel fashion. To demonstrate the efficacy of the developed system, we exploit it to develop two prototype systems named “Egomobility" and “Egoplan". Egomobility provides general-purpose navigation functionalities for a wide variety of mobile robots and Egoplan solves path planning problems in high dimensional continuous state and action spaces for robot arms. Our result shows that RoboFlow can significantly streamline the whole development lifecycle and the same workflow is applicable to numerous intelligent robotic applications.' volume: 164 URL: https://proceedings.mlr.press/v164/lin22c.html PDF: https://proceedings.mlr.press/v164/lin22c/lin22c.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-lin22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Qinjie family: Lin - given: Guo family: Ye - given: Jiayi family: Wang - given: Han family: Liu editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1789-1794 id: lin22c issued: date-parts: - 2022 - 1 - 11 firstpage: 1789 lastpage: 1794 published: 2022-01-11 00:00:00 +0000 - title: 'Decentralized Sharing and Valuation of Fleet Robotic Data' abstract: 'We propose a decentralized learning framework for robots to trade, price, and discover valuable machine learning (ML) training data. Today’s robotic fleets, such as self-driving vehicles, can gather terabytes of rich video and LIDAR data in diverse, geo-distributed environments. Often, robots in one city or home might observe training data that is commonplace for them, but is actually a valuable, out-of-distribution (OoD) dataset to train robust ML models at robots elsewhere. However, simply sharing all this diverse data in cloud databases is infeasible due to limits on privacy and network bandwidth. Inspired by decentralized file sharing protocols like BitTorrent, we propose a novel system where each robot is provisioned with a learnable privacy filter and sharing model. Importantly, this sharing model attempts to predict and prioritize which sensory percepts are of high value to other robotic peers using a decentralized voting and feedback mechanism. Our scheme naturally raises timely questions on data privacy and valuation as companies start to deploy robots in our homes, hospitals, and roads. ' volume: 164 URL: https://proceedings.mlr.press/v164/geng22a.html PDF: https://proceedings.mlr.press/v164/geng22a/geng22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-geng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Yuchong family: Geng - given: Dongyue family: Zhang - given: Po-han family: Li - given: Oguzhan family: Akcin - given: Ao family: Tang - given: Sandeep P. family: Chinchali editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1795-1800 id: geng22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1795 lastpage: 1800 published: 2022-01-11 00:00:00 +0000 - title: 'Auditing Robot Learning for Safety and Compliance during Deployment' abstract: 'Robots of the future are going to exhibit increasingly human-like and super-human intelligence in a myriad of different tasks. They are also likely going to fail and be incompliant with human preferences in increasingly subtle ways. Towards the goal of achieving autonomous robots, the robot learning community has made rapid strides in applying machine learning techniques to train robots through data and interaction. This makes the study of how best to audit these algorithms for checking their compatibility with humans, pertinent and urgent. In this paper, we draw inspiration from the AI Safety and Alignment communities and make the case that we need to urgently consider ways in which we can best audit our robot learning algorithms to check for failure modes, and ensure that when operating autonomously, they are indeed behaving in ways that the human algorithm designers intend them to. We believe that this is a challenging problem that will require efforts from the entire robot learning community, and do not attempt to provide a concrete framework for auditing. Instead, we outline high-level guidance and a possible approach towards formulating this framework which we hope will serve as a useful starting point for thinking about auditing in the context of robot learning.' volume: 164 URL: https://proceedings.mlr.press/v164/bharadhwaj22a.html PDF: https://proceedings.mlr.press/v164/bharadhwaj22a/bharadhwaj22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-bharadhwaj22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Homanga family: Bharadhwaj editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1801-1806 id: bharadhwaj22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1801 lastpage: 1806 published: 2022-01-11 00:00:00 +0000 - title: 'Toward robots that learn to summarize their actions in natural language: a set of tasks' abstract: 'Robots should be able to report in natural language what they have done. They should provide concise summaries, respond to questions about them, and be able to learn from the natural language responses they receive to their summaries. We propose that developing the capabilities for robots to summarize their actions is a new and necessary challenge which should be taken up by the robotic learning community. We propose an initial framework for robot action summarization, presented as a set of tasks which can serve as a target for research and a measure of progress.' volume: 164 URL: https://proceedings.mlr.press/v164/dechant22a.html PDF: https://proceedings.mlr.press/v164/dechant22a/dechant22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-dechant22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Chad family: DeChant - given: Daniel family: Bauer editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1807-1813 id: dechant22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1807 lastpage: 1813 published: 2022-01-11 00:00:00 +0000 - title: 'Dual-Arm Adversarial Robot Learning' abstract: 'Robot learning is a very promising topic for the future of automation and machine intelligence. Future robots should be able to autonomously acquire skills, learn to represent their environment, and interact with it. While these topics have been explored in simulation, real-world robot learning research seems to be still limited. This is due to the additional challenges encountered in the real-world, such as noisy sensors and actuators, safe exploration, non-stationary dynamics, autonomous environment resetting as well as the cost of running experiments for long periods of time. Unless we develop scalable solutions to these problems, learning complex tasks involving hand-eye coordination and rich contacts will remain an untouched vision that is only feasible in controlled lab environments. We propose Dual-Arm settings as platforms for robot learning. Such settings enable safe data collection for acquiring manipulation skills as well as training perception modules in a robot-supervised manner. They also ease the processes of resetting the environment. Furthermore, adversarial learning could potentially boost the generalization capability of robot learning methods by maximizing the exploration based on game-theoretic objectives while ensuring safety based on collaborative task spaces. In this paper, we will discuss the potential benefits of this setup as well as the challenges and research directions that can be pursued.' volume: 164 URL: https://proceedings.mlr.press/v164/aljalbout22a.html PDF: https://proceedings.mlr.press/v164/aljalbout22a/aljalbout22a.pdf edit: https://github.com/mlresearch//v164/edit/gh-pages/_posts/2022-01-11-aljalbout22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 5th Conference on Robot Learning' publisher: 'PMLR' author: - given: Elie family: Aljalbout editor: - given: Aleksandra family: Faust - given: David family: Hsu - given: Gerhard family: Neumann page: 1814-1819 id: aljalbout22a issued: date-parts: - 2022 - 1 - 11 firstpage: 1814 lastpage: 1819 published: 2022-01-11 00:00:00 +0000