Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of the 1st Annual Conference on Robot Learning on 13-15 November 2017 Published as Volume 78 by the Proceedings of Machine Learning Research on 18 October 2017. Volume Edited by: Sergey Levine Vincent Vanhoucke Ken Goldberg Series Editors: Neil D. Lawrence Mark Reid https://proceedings.mlr.press/v78/ Wed, 08 Feb 2023 10:43:31 +0000 Wed, 08 Feb 2023 10:43:31 +0000 Jekyll v3.9.3 Fast Residual Forests: Rapid Ensemble Learning for Semantic Segmentation In recent times, Convolutional Neural Network (CNN) based approaches have performed exceptionally well in many computer vision related tasks, including classification and segmentation. These approaches have shown that given enough training data and time, they can often perform at a level significantly higher than the alternative methods. However, in the context of robotic learning, it is commonly the case that both time and training data are limited. In this work, we propose a learning approach that is more suitable for robotic learning; it substantially reduces the time required to learn and provides much higher performance when training data is limited. Our method combines random forests with deep convolution networks, leveraging the strengths of both frameworks. We develop a method for generating derivatives from our highly non-linear forest classifier which in turn enables training of the CNN. Furthermore, our method allows leaf distributions in the ensemble classifier to be trained jointly with one another using Stochastic Gradient Descent (SGD), allowing for parallel training of a large number of tree classifiers at once. This results in a drastic increase in training speed. Our model demonstrates significant performance improvements over pure deep learning methods, notably on datasets with limited training data. We apply our method to the outdoor and indoor segmentation datasets of KITTI and NYUv2-40, outperforming multiple pure deep learning methods whilst using a fraction of training time normally required. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/zuo17a.html https://proceedings.mlr.press/v78/zuo17a.html Mutual Alignment Transfer Learning Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, these can perform sub-optimally on the real platform given imperfect calibration of model dynamics. We present an approach - supplemental to fine tuning on the real robot - to further benefit from parallel access to a simulator during training. The developed approach harnesses auxiliary rewards to guide the exploration for the real world agent based on the proficiency of the agent in simulation and vice versa. In this context, we demonstrate empirically that the reciprocal alignment for both agents provides further benefit as the agent in simulation can adjust to optimize its behaviour for states commonly visited by the real-world agent. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/wulfmeier17a.html https://proceedings.mlr.press/v78/wulfmeier17a.html Emergent Behaviors in Mixed-Autonomy Traffic Traffic dynamics are often modeled by complex dynamical systems for which classical analysis tools can struggle to provide tractable policies used by transportation agencies and planners. In light of the introduction of automated vehicles into transportation systems, there is a new need for understanding the impacts of automation on transportation networks. The present article formulates and approaches the mixed-autonomy traffic control problem (where both automated and human-driven vehicles are present) using the powerful framework of deep reinforcement learning (RL). The resulting policies and emergent behaviors in mixed-autonomy traffic settings provide insight for the potential for automation of traffic through mixed fleets of automated and manned vehicles. Model-free learning methods are shown to naturally select policies and behaviors previously designed by model-driven approaches, such as stabilization and platooning, known to improve ring road efficiency and to even exceed a theoretical velocity limit. Remarkably, RL succeeds at maximizing velocity by effectively leveraging the structure of the human driving behavior to form an efficient vehicle spacing for an intersection network. We describe our results in the context of existing control theoretic results for stability analysis and mixed-autonomy analysis. This article additionally introduces state equivalence classes to improve the sample complexity for the learning methods. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/wu17a.html https://proceedings.mlr.press/v78/wu17a.html Learning Dynamics Across Similar Spatiotemporally-Evolving Physical Systems We present a differentially-constrained machine learning model that can generalize over similar spatiotemporally evolving dynamical systems. It is shown that not only can an E-GP model be used to estimate the latent state of large-scale physical systems of this type, but that a single E-GP model can generalize over multiple physically-similar systems over a range of parameters using only a few training sets. This is demonstrated on computational flow dynamics (CFD) data sets on fluid flowing past a cylinder at different Reynolds numbers. Though these systems are governed by highly nonlinear partial differential equations (the Navier-Stokes equations), we show that their major dynamical modes can be captured by a linear dynamical layer over the temporal evolution of the weights of stationary kernels. Furthermore, the model generated by this method provides easy access to physical insights into the system, unlike comparable methods like Recurrent Neural Networks (RNN). The low computational cost of this method suggests that it has the potential to enable machine learning approximations of complex physical phenomena for autonomy and robotic design tasks. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/whitman17a.html https://proceedings.mlr.press/v78/whitman17a.html How Robots Learn to Classify New Objects Trained from Small Data Sets In this paper, we address the problem of learning to classify new object classes and instances by adapting a previously trained classifier. The main challenges here are the small amount of newly available training data and the large change in appearance between the new and the old data. To address this we propose a new variant of Progressive Neural Networks (PNN), originally introduced by Rusu et al. We show that by performing a specific simplification in the adapters, the prediction performance of the resulting PNN can be significantly increased. Furthermore, we give additional insights about when PNNs outperform alternative methods, and provide empirical evaluations on benchmark datasets. Finally, we also suggests a way of using it to augment the functionality of a network by extending it with new classes, addressing the problem of unbalanced classes, i.e. where the new classes are under-represented. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/wang17a.html https://proceedings.mlr.press/v78/wang17a.html Learning a visuomotor controller for real world robotic grasping using simulated depth images We want to build robots that are useful in unstructured real world applications, such as doing work in the household. Grasping in particular is an important skill in this domain, yet it remains a challenge. One of the key hurdles is handling unexpected changes or motion in the objects being grasped and kinematic noise or other errors in the robot. This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object. We use a wrist-mounted sensor to acquire depth images in front of the gripper and train a convolutional neural network to learn a distance function to true grasps for grasp configurations over an image. The training sensor data is generated in simulation, a major advantage over previous work that uses real robot experience, which is costly to obtain. Despite being trained in simulation, our approach works well on real noisy sensor images. We compare our controller in simulated and real robot experiments to a strong baseline for grasp pose detection, and find that our approach significantly outperforms the baseline in the presence of kinematic noise, perceptual errors and disturbances of the object during grasping. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/viereck17a.html https://proceedings.mlr.press/v78/viereck17a.html Opportunistic Active Learning for Grounding Natural Language Descriptions Active learning identifies data points from a pool of unlabeled examples whose labels, if made available, are most likely to improve the predictions of a supervised model. Most research on active learning assumes that an agent has access to the entire pool of unlabeled data and can ask for labels of any data points during an initial training phase. However, when incorporated in a larger task, an agent may only be able to query some subset of the unlabeled pool. An agent can also opportunistically query for labels that may be useful in the future, even if they are not immediately relevant. In this paper, we demonstrate that this type of opportunistic active learning can improve performance in grounding natural language descriptions of everyday objects—an important skill for home and office robots. We find, with a real robot in an object identification setting, that inquisitive behavior—asking users important questions about the meanings of words that may be off-topic for the current dialog—leads to identifying the correct object more often over time. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/thomason17a.html https://proceedings.mlr.press/v78/thomason17a.html Online Learning with Stochastic Recurrent Neural Networks using Intrinsic Motivation Signals Continuous online adaptation is an essential ability for the vision of fully autonomous and lifelong-learning robots. Robots need to be able to adapt to changing environments and constraints while this adaption should be performed without interrupting the robot’s motion. In this paper, we introduce a framework for probabilistic online motion planning and learning based on a bio-inspired stochastic recurrent neural network. Furthermore, we show that the model can adapt online and sample-efficiently using intrinsic motivation signals and a mental replay strategy. This fast adaptation behavior allows the robot to learn from only a small number of physical interactions and is a promising feature for reusing the model in different environments. We evaluate the online planning with a realistic dynamic simulation of the KUKA LWR robotic arm. The efficient online adaptation is shown in simulation by learning an unknown workspace constraint using mental replay and \textitcognitive dissonance as intrinsic motivation signal. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/tanneberg17a.html https://proceedings.mlr.press/v78/tanneberg17a.html image2mass: Estimating the Mass of an Object from Its Image Successful robotic manipulation of real-world objects requires an understanding of the physical properties of these objects. We propose a model for estimating one such physical property, mass, from an object’s image. We collect a large dataset of online product information containing images, sizes, and weights. We compare several baseline models for the image-to-mass problem that were trained on this dataset. We also characterize human performance on the problem. Finally, we present a model that takes into account an estimate of the 3D shape of the object. This model performs significantly better than these baselines and compares favorably to the performance of humans. All models are tested on a held-out set of product data, as well as a relatively small dataset that we captured with a scale and a digital camera. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/standley17a.html https://proceedings.mlr.press/v78/standley17a.html Improved Adversarial Systems for 3D Object Generation and Reconstruction This paper describes a new approach for training generative adversarial networks (GAN) to understand the detailed 3D shape of objects. While GANs have been used in this domain previously, they are notoriously hard to train, especially for the complex joint data distribution over 3D objects of many categories and orientations. Our method extends previous work by employing the Wasserstein distance normalized with gradient penalization as a training objective. This enables improved generation from the joint object shape distribution. Our system can also reconstruct 3D shape from 2D images and perform shape completion from occluded 2.5D range scans. We achieve notable quantitative improvements in comparison to existing baselines. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/smith17a.html https://proceedings.mlr.press/v78/smith17a.html Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics We uncouple three components of autonomous behavior (utilitarian value, causal reasoning, and fine motion control) to design an interpretable model of tasks from video demonstrations. Utilitarian value is learned from aggregating human preferences to understand the implicit goal of a task, explaining \textitwhy an action sequence was performed. Causal reasoning is seeded from observations and grows from robot experiences to explain \textithow to deductively accomplish sub-goals. And lastly, fine motion control describes \textitwhat actuators to move. In our experiments, a robot learns how to fold t-shirts from visual demonstrations, and proposes a plan (by answering \textitwhy, \textithow, and \textitwhat) when folding never-before-seen articles of clothing. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/shukla17a.html https://proceedings.mlr.press/v78/shukla17a.html Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping Hilbert mapping is an efficient technique for building continuous occupancy maps from depth sensors such as LiDAR in static environments. However, to make the map adaptable to dynamic environments, its parameters need to be learned automatically. In this paper, we take a variational Bayesian approach to this problem, thus eliminating the regularization term typically adjusted heuristically. We extend the proposed model to learn long-term occupancy maps in dynamic environments in a sequential fashion, demonstrating the power of kernel methods to capture abstract nonlinear patterns and Bayesian learning to construct sophisticated models. Experiments conducted in environments with moving vehicles show that the proposed approach has a significant speed improvement over the state-of-the-art techniques and maintain a similar or better accuracy. We also discuss the robustness against occlusions and various theoretical and empirical aspects of building long-term dynamic occupancy maps. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/senanayake17a.html https://proceedings.mlr.press/v78/senanayake17a.html Learning Robotic Manipulation of Granular Media In this paper, we examine the problem of robotic manipulation of granular media. We evaluate multiple predictive models used to infer the dynamics of scooping and dumping actions. These models are evaluated on a task that involves manipulating the media in order to deform it into a desired shape. Our best performing model is based on a highly-tailored convolutional network architecture with domain-specific optimizations, which we show accurately models the physical interaction of the robotic scoop with the underlying media. We empirically demonstrate that explicitly predicting physical mechanics results in a policy that out-performs both a hand-crafted dynamics baseline, and a “value-network”, which must otherwise implicitly predict the same mechanics in order to produce accurate value estimates. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/schenck17a.html https://proceedings.mlr.press/v78/schenck17a.html Sim-to-Real Robot Learning from Pixels with Progressive Nets Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using \emphprogressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/rusu17a.html https://proceedings.mlr.press/v78/rusu17a.html Learning Partially Contracting Dynamical Systems from Demonstrations An algorithm for learning the dynamics of point-to-point motions from demonstrations using an autonomous nonlinear dynamical system, named contracting dynamical system primitives (CDSP), is presented. The motion dynamics are approximated using a Gaussian mixture model (GMM) and its parameters are learned subject to constraints derived from partial contraction analysis. Systems learned using the proposed method generate trajectories that accurately reproduce the demonstrations and are guaranteed to converge to a desired goal location. Additionally, the learned models are capable of quickly and appropriately adapting to unexpected spatial perturbations and changes in goal location during reproductions. The CDSP algorithm is evaluated on shapes from a publicly available human handwriting dataset and also compared with two state-of-the-art motion generation algorithms. Furthermore, the CDSP algorithm is also shown to be capable of learning and reproducing point-to-point motions directly from real-world demonstrations using a Baxter robot. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/ravichandar17a.html https://proceedings.mlr.press/v78/ravichandar17a.html Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning In this paper, we present Combined Learning from demonstration And Motion Planning (CLAMP) as an efficient approach to skill learning and generalizable skill reproduction. CLAMP combines the strengths of Learning from Demonstration (LfD) and motion planning into a unifying framework. We carry out probabilistic inference to find trajectories which are optimal with respect to a given skill and also feasible in different scenarios. We use factor graph optimization to speed up inference. To encode optimality, we provide a new probabilistic skill model based on a stochastic dynamical system. This skill model requires minimal parameter tuning to learn, is suitable to encode skill constraints, and allows efficient inference. Preliminary experimental results showing skill generalization over initial robot state and unforeseen obstacles are presented. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/rana17a.html https://proceedings.mlr.press/v78/rana17a.html Learning Stable Task Sequences from Demonstration with Linear Parameter Varying Systems and Hidden Markov Models The problem of acquiring multiple tasks from demonstration is typically divided in two sequential processes: (1) the segmentation or identification of different subgoals/subtasks and (2) a separate learning process that parameterizes a control policy for each subtask. As a result, segmentation criteria typically neglect the characteristics of control policies and rely instead on simplified models. This paper aims for a single model capable of learning sequences of complex time-independent control policies that provide robust and stable behavior. To this end, we first present a novel and efficient approach to learn goal-oriented time-independent motion models by estimating \emphboth attractor and dynamic behavior from data guaranteeing stability using linear parameter varying (LPV) systems. This method enables learning complex task sequences with hidden Markov models (HMMs), where each state/subtask is given by a stable LPV system and where transitions are most likely around the corresponding attractor. We study the dynamics of the HMM-LPV model and propose a motion generation method that guarantees the stability of task sequences. We validate our approach in two sets of demonstrated human motions. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/medina17a.html https://proceedings.mlr.press/v78/medina17a.html Extending Model-based Policy Gradients for Robots in Heteroscedastic Environments In this paper, we consider the problem of learning robot control policies in heteroscedastic environments, whose noise properties vary throughout a robot’s state and action space. We consider reinforcement learning algorithms that evaluate policies using learned models of the environment, and we extend this class of algorithms to capture heteroscedastic effects with two enchained Gaussian processes. We explore the capabilities and limitations of this approach, and demonstrate that it reduces model bias across a variety of simulated robotic systems. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/martin17a.html https://proceedings.mlr.press/v78/martin17a.html Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences Recent results suggest that it is possible to grasp a variety of singulated objects with high precision using Convolutional Neural Networks (CNNs) trained on synthetic data. This paper considers the task of bin picking, where multiple objects are randomly arranged in a heap and the objective is to sequentially grasp and transport each into a packing box. We model bin picking with a discrete-time Partially Observable Markov Decision Process that specifies states of the heap, point cloud observations, and rewards. We collect synthetic demonstrations of bin picking from an algorithmic supervisor uses full state information to optimize for the most robust collision-free grasp in a forward simulator based on pybullet to model dynamic object-object interactions and robust wrench space analysis from the Dexterity Network (Dex-Net) to model quasi-static contact between the gripper and object. We learn a policy by fine-tuning a Grasp Quality CNN on Dex-Net 2.1 to classify the supervisor’s actions from a dataset of 10,000 rollouts of the supervisor in the simulator with noise injection. In 2,192 physical trials of bin picking with an ABB YuMi on a dataset of 50 novel objects, we find that the resulting policies can achieve 94$%$ success rate and 96$%$ average precision (very few false positives) on heaps of 5-10 objects and can clear heaps of 10 objects in under three minutes. Datasets, experiments, and supplemental material are available at \urlhttp://berkeleyautomation.github.io/dex-net. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/mahler17a.html https://proceedings.mlr.press/v78/mahler17a.html Active Incremental Learning of Robot Movement Primitives Robots that can learn over time by interacting with non-technical users must be capable of acquiring new motor skills, incrementally. The problem then is deciding when to teach the robot a new skill or when to rely on the robot generalizing its actions. This decision can be made by the robot if it is provided with means to quantify the suitability of its own skill given an unseen task. To this end, we present an algorithm that allows a robot to make active requests to incrementally learn movement primitives. A movement primitive is learned on a trajectory output by a Gaussian Process. The latter is used as a library of demonstrations that can be extrapolated with confidence margins. This combination not only allows the robot to generalize using as few as a single demonstration but more importantly, to indicate when such generalization can be executed with confidence or not. In experiments, a real robot arm indicates to the user which demonstrations should be provided to increase its repertoire of reaching skills. Experiments will also show that the robot becomes confident in reaching objects for whose demonstrations were never provided, by incrementally learning from the neighboring demonstrations. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/maeda17a.html https://proceedings.mlr.press/v78/maeda17a.html Adaptable Pouring: Teaching Robots Not to Spill using Fast but Approximate Fluid Simulation Humans manipulate fluids intuitively using intuitive approximations of the underlying physical model. In this paper, we explore a general methodology that robots may use to develop and improve strategies for overcoming manipulation tasks associated with appropriately defined loss functions. We focus on the specific task of pouring a liquid from a container (pourer) to another container (receiver) while minimizing the mass of liquid that spills outside the receiver. We present a solution, based on guidance from approximate simulation, that is fast, flexible and adaptable to novel containers as long as their shapes can be sensed. Our key idea is to decouple the optimization of the parameter space of the simulator from the optimization over action space for determining robot control actions. We perform the former in a training (calibration) stage and the latter during run-time (deployment). For the purpose of this paper we use pouring in both stages, even though separate actions could be chosen. We compare four different strategies for calibration and three different strategies for deployment. Our results demonstrate that fast fluid simulations are effective, even if they are only approximate, in guiding automatic strategies for pouring liquids. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/lopez-guevara17a.html https://proceedings.mlr.press/v78/lopez-guevara17a.html CORe50: a New Dataset and Benchmark for Continuous Object Recognition Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/lomonaco17a.html https://proceedings.mlr.press/v78/lomonaco17a.html Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation We proposed a multimodal end-to-end policy based on deep reinforcement learning (DRL) that leverages sensor fusion to reduced performance drops in noisy environment from 50% to 10% compared with the baseline and makes the policy functional even in the face of partial sensor failure by using a novel stochastic technique called Sensor Dropout to reduce sensitivity to any sensor subset, and a new auxiliary loss on policy network along with standard DRL loss that reduces the action variations. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/liu17a.html https://proceedings.mlr.press/v78/liu17a.html DART: Noise Injection for Robust Imitation Learning One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy. A known problem with this “off-policy" approach is that the robot’s errors compound when drifting away from the supervisor’s demonstrations. On-policy, techniques alleviate this by iteratively collecting corrective actions for the current robot policy. However, these techniques can be difficult for human supervisors, add significant computation burden, and require the robot to visit potentially dangerous states during training. We propose an off-policy approach that \emphinjects noise into the supervisor’s policy while demonstrating. This forces the supervisor and robot to explore and recover from errors without letting them compound. We propose a new algorithm, DART, that collects demonstrations with injected noise, and optimizes the noise level to approximate the error of the robot’s trained policy during data collection. We provide a theoretical analysis to illustrate that DART reduces covariate shift more than Behavior Cloning for a robot with non-zero error. We evaluate DART in two domains: in simulation with an algorithmic supervisor on the MuJoCo locomotive tasks and in physical experiments with human supervisors training a Toyota HSR robot to perform grasping in clutter. For challenging tasks like Humanoid, DART can be up to $280%$ faster in computation time and only decreases the supervisor’s cumulative reward by $5%$ during training, whereas DAgger executes policies that have $80%$ less cumulative reward than the supervisor. On the grasping in clutter task, DART obtains on average $62%$ performance increase over Behavior Cloning. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/laskey17a.html https://proceedings.mlr.press/v78/laskey17a.html DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations An option is a short-term skill consisting of a control policy for a specified region of the state space, and a termination condition recognizing leaving that region. In prior work, we proposed an algorithm called Deep Discovery of Options (DDO) to discover options to accelerate reinforcement learning in Atari games. This paper studies an extension to robot imitation learning, called Discovery of Deep Continuous Options (DDCO), where low-level continuous control skills parametrized by deep neural networks are learned from demonstrations. We extend DDO with: (1) a hybrid categorical-continuous distribution model to parametrize high-level policies that can invoke discrete options as well continuous control actions, and (2) a cross-validation method that relaxes DDO’s requirement that users specify the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot in the vertical plane pushing a block with friction and gravity, and in two physical experiments on the da Vinci surgical robot, needle insertion where a needle is grasped and inserted into a silicone tissue phantom, and needle bin picking where needles and pins are grasped from a pile and categorized into bins. In the 3-link arm simulation, results suggest that DDCO can take 3x fewer demonstrations to achieve the same reward compared to a baseline imitation learning approach. In the needle insertion task, DDCO was successful 8/10 times compared to the next most accurate imitation learning baseline 6/10. In the surgical bin picking task, the learned policy successfully grasps a single object in 66 out of 99 attempted grasps, and in all but one case successfully recovered from failed grasps by retrying a second time. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/krishnan17a.html https://proceedings.mlr.press/v78/krishnan17a.html Hierarchical Reinforcement Learning with Parameters In this work we introduce and evaluate a model of Hierarchical Reinforcement Learning with Parameters. In the first stage we train agents to execute relatively simple actions like reaching or gripping. In the second stage we train a hierarchical manager to compose these actions to solve more complicated tasks. The manager may pass parameters to agents thus controlling details of undertaken actions. The hierarchical approach with parameters can be used with any optimization algorithm. In this work we adapt to our setting methods described in [1]. We show that their theoretical foundation, including monotonicity of improvements, still holds. We experimentally compare the hierarchical reinforcement learning with the standard, non-hierarchical approach and conclude that the hierarchical learning with parameters is a viable way to improve final results and stability of learning. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/klimek17a.html https://proceedings.mlr.press/v78/klimek17a.html Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning Continuous control of high-dimensional systems can be achieved by current state-of-the-art reinforcement learning methods such as the Deep Deterministic Policy Gradient algorithm, but needs a significant amount of data samples. For real-world systems, this can be an obstacle since excessive data collection can be expensive, tedious or lead to physical damage. The main incentive of this work is to keep the advantages of model-free Q-learning while minimizing real-world interaction by the employment of a dynamics model learned in parallel. To counteract adverse effects of imaginary rollouts with an inaccurate model, a notion of uncertainty is introduced, to make use of artificial data only in cases of high uncertainty. We evaluate our approach on three simulated robot tasks and achieve faster learning by at least 40 per cent in comparison to vanilla DDPG with multiple updates. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/kalweit17a.html https://proceedings.mlr.press/v78/kalweit17a.html End-to-End Learning of Semantic Grasping We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/jang17a.html https://proceedings.mlr.press/v78/jang17a.html Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task End-to-end control for robot manipulation and grasping is emerging as an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to either be slow to train, exhibit little or no generalisability, or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper, we show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image. This involves locating, reaching for, and grasping a cube, then locating a basket and dropping the cube inside. To achieve this, robot trajectories are computed in a simulator, to collect a series of control velocities which accomplish the task. Then, a CNN is trained to map observed images to velocities, using domain randomisation to enable generalisation to real world images. Results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with dynamic lighting conditions, distractor objects, and moving objects, including the basket itself. We believe our approach to be simple, highly scalable, and capable of learning long-horizon tasks that have until now not been shown with the state-of-the-art in end-to-end robot control. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/james17a.html https://proceedings.mlr.press/v78/james17a.html Principal Variety Analysis We introduce a novel computational framework, Principal Variety Analysis (PVA), for primarily nonlinear data modeling. PVA accommodates algebraic sets as the target subspace through which limitations of other existing approaches is dealt with. PVA power is demonstrated in this paper for learning kinematics of objects, as an important application. PVA takes recorded coordinates of some pre-specified features on the objects as input and outputs a lowest dimensional variety on which the feature coordinates jointly lie. Unlike existing object modeling methods, which require entire trajectories of objects, PVA requires much less information and provides more flexible and generalizable models, namely an analytical algebraic kinematic model of the objects, even in unstructured, uncertain environments. Moreover, it is not restricted to predetermined model templates and is capable of extracting much more general types of models. Besides finding the kinematic model of objects, PVA can be a powerful tool to estimate their corresponding degrees of freedom. PVA computational success depends on exploiting sparsity, in particular algebraic dimension minimization through replacement of intractable $\ell_0$ norm (rank) with tractable $\ell_1$ norm (nuclear norm). Complete characterization of the assumptions under which $\ell_0$ and $\ell_1$ norm minimizations yield virtually the same outcome is introduced as an important open problem in this paper. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/iraji17a.html https://proceedings.mlr.press/v78/iraji17a.html Efficient Automatic Perception System Parameter Tuning On Site without Expert Supervision Many modern perception systems require human engineers to tune parameters in order to adapt to various environments and applications. This incurs a large startup cost when deploying a robotic system by relying on human expertise and ground truth instrumentation. To alleviate this, we propose a technique using empirical trials to automatically tune a perception system’s parameters on-site without expert supervision. Our approach extends upon recent work on introspecting perception performance and uses Bayesian optimization to efficiently search the parameter configuration space. We validate our technique by tuning the laser and visual odometry systems of a physical ground robot in a variety of environments, achieving estimation errors competitive with baseline approaches that use ground truth. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/hu17a.html https://proceedings.mlr.press/v78/hu17a.html Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and “intentions” directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/gao17a.html https://proceedings.mlr.press/v78/gao17a.html Harvesting Common-sense Navigational Knowledge for Robotics from Uncurated Text Corpora As robotic systems are deployed into everyday situations, the need for abstract reasoning becomes more pronounced. The ideal robotic assistant should be able to understand verbal commands and work independently to fulfill human-prescribed goals, even if instructions are ambiguous or circumstances change. This paper presents a new algorithm for high-level reasoning based on Euclidean representations of words and their meanings. Rather than using ontologies or knowledge graphs, we model information about the world as a learned geometry of the contexts in which human beings tend to use each idea. Building on the analogy algorithms utilized by Mikolov et al., we perform mathematical operations on the vector space to infer responses to previously unseen problems, and apply our method to a sequence of semantic reasoning tasks in order to answer questions such as ‘Where can I find a dustpan?’, ‘Where do the crayons belong?’, and ‘What transportation method will bring me to the airport?’. Our Directional Scoring Method (DSM) returns a ranked list of possible responses, many of which are plausible answers to the query. Additionally, DSM’s top-ranked response is significantly more likely to be correct than the top-ranked responses of naive analogy estimations. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/fulda17a.html https://proceedings.mlr.press/v78/fulda17a.html Self-Supervised Visual Planning with Temporal Skip Connections In order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected data is prediction. If a robot can learn to predict the future, it can use this predictive model to take actions to produce desired outcomes, such as moving an object to a particular location. However, in complex open-world scenarios, designing a representation for prediction is difficult. In this work, we instead aim to enable self-supervised robot learning through direct video prediction: instead of attempting to design a good representation, we directly predict what the robot will see next, and then use this model to achieve desired goals. A key challenge in video prediction for robotic manipulation is handling complex spatial arrangements such as occlusions. To that end, we introduce a video prediction model that can keep track of objects through occlusion by incorporating temporal skip-connections. Together with a novel planning criterion and action space formulation, we demonstrate that this model substantially outperforms prior work on video prediction-based control. Our results show manipulation of objects not seen during training, handling multiple objects, and pushing objects around obstructions. These results represent a significant advance in the range and complexity of skills that can be performed entirely with self-supervised robot learning. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/frederik-ebert17a.html https://proceedings.mlr.press/v78/frederik-ebert17a.html Reverse Curriculum Generation for Reinforcement Learning Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in “reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/florensa17a.html https://proceedings.mlr.press/v78/florensa17a.html One-Shot Visual Imitation Learning via Meta-Learning In order for a robot to be a generalist that can perform a wide range of jobs, it must be able to acquire a wide variety of skills quickly and efficiently in complex unstructured environments. High-capacity models such as deep neural networks can enable a robot to represent complex skills, but learning each skill from scratch then becomes infeasible. In this work, we present a meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration. Unlike prior methods for one-shot imitation, our method can scale to raw pixel inputs and requires data from significantly fewer prior tasks for effective learning of new skills. Our experiments on both simulated and real robot platforms demonstrate the ability to learn new tasks, end-to-end, from a single visual demonstration. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/finn17a.html https://proceedings.mlr.press/v78/finn17a.html Learning Data-Efficient Rigid-Body Contact Models: Case Study of Planar Impact In this paper we demonstrate the limitations of common rigid-body contact models used in the robotics community by comparing them to a collection of data-driven and data-reinforced models that exploit underlying structure inspired by the rigid contact paradigm. We evaluate and compare the analytical and data-driven contact models on an empirical planar impact data-set, and show that the learned models are able to outperform their analytical counterparts with a small training set. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/fazeli17a.html https://proceedings.mlr.press/v78/fazeli17a.html Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks Material recognition enables robots to incorporate knowledge of material properties into their interactions with everyday objects. For example, material recognition opens up opportunities for clearer communication with a robot, such as "bring me the metal coffee mug", and recognizing plastic versus metal is crucial when using a microwave or oven. However, collecting labeled training data with a robot is often more difficult than unlabeled data. We present a semi-supervised learning approach for material recognition that uses generative adversarial networks (GANs) with haptic features such as force, temperature, and vibration. Our approach achieves state-of-the-art results and enables a robot to estimate the material class of household objects with 90% accuracy when 92% of the training data are unlabeled. We explore how well this approach can recognize the material of new objects and we discuss challenges facing generalization. To motivate learning from unlabeled training data, we also compare results against several common supervised learning classifiers. In addition, we have released the dataset used for this work which consists of time-series haptic measurements from a robot that conducted thousands of interactions with 72 household objects. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/erickson17a.html https://proceedings.mlr.press/v78/erickson17a.html Gradient-free Policy Architecture Search and Adaptation We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/ebrahimi17a.html https://proceedings.mlr.press/v78/ebrahimi17a.html Aggressive Deep Driving: Combining Convolutional Neural Networks and Model Predictive Control We present a framework for vision-based model predictive control (MPC) for the task of aggressive, high-speed autonomous driving. Our approach uses deep convolutional neural networks to predict cost functions from input video which are directly suitable for online trajectory optimization with MPC. We demonstrate the method in a high speed autonomous driving scenario, where we use a single monocular camera and a deep convolutional neural network to predict a cost map of the track in front of the vehicle. Results are demonstrated on a 1:5 scale autonomous vehicle given the task of high speed, aggressive driving. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/drews17a.html https://proceedings.mlr.press/v78/drews17a.html CARLA: An Open Urban Driving Simulator We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an end-to-end model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform’s utility for autonomous driving research. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/dosovitskiy17a.html https://proceedings.mlr.press/v78/dosovitskiy17a.html Optimizing Long-term Predictions for Model-based Policy Search We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/doerr17a.html https://proceedings.mlr.press/v78/doerr17a.html Fastron: An Online Learning-Based Model and Active Learning Strategy for Proxy Collision Detection We introduce the Fastron, a configuration space (C-space) model to be used as a proxy to kinematic-based collision detection. The Fastron allows iterative updates to account for a changing environment through a combination of a novel formulation of the kernel perceptron learning algorithm and an active learning strategy. Our simulations on a 7 degree-of-freedom arm indicate that proxy collision checks may be performed at least 2 times faster than an efficient polyhedral collision checker and at least 8 times faster than an efficient high-precision collision checker. The Fastron model provides conservative collision status predictions by padding C-space obstacles, and proxy collision checking time does not scale poorly as the number of workspace obstacles increases. All results were achieved without GPU acceleration or parallel computing. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/das17a.html https://proceedings.mlr.press/v78/das17a.html Bayesian Interaction Primitives: A SLAM Approach to Human-Robot Interaction This paper introduces a fully Bayesian reformulation of Interaction Primitives for human-robot interaction and collaboration. A key insight is that a subset of human-robot interaction is conceptually related to simultaneous localization and mapping techniques. Leveraging this insight we can significantly increase the accuracy of temporal estimation and inferred trajectories while simultaneously reducing the associated computational complexity. We show that this enables more complex human-robot interaction scenarios involving more degrees of freedom. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/campbell17a.html https://proceedings.mlr.press/v78/campbell17a.html The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? A successful grasp requires careful balancing of the contact forces. Deducing whether a particular grasp will be successful from indirect measurements, such as vision, is therefore quite challenging, and direct sensing of contacts through touch sensing provides an appealing avenue toward more successful and consistent robotic grasping. However, in order to fully evaluate the value of touch sensing for grasp outcome prediction, we must understand how touch sensing can influence outcome prediction accuracy when combined with other modalities. Doing so using conventional model-based techniques is exceptionally difficult. In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch. To that end, we collected more than 9,000 grasping trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger, and evaluated visuo-tactile deep neural network models to directly predict grasp outcomes from either modality individually, and from both modalities together. Our experimental results indicate that incorporating tactile readings substantially improve grasping performance. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/calandra17a.html https://proceedings.mlr.press/v78/calandra17a.html The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/cabi17a.html https://proceedings.mlr.press/v78/cabi17a.html Learning Heuristic Search via Imitation Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that \emphexplicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present \textscSaIL, an efficient algorithm that trains heuristic policies by imitating \emphclairvoyant oracles - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that \textscSaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time. Open-source code and details can be found here: https://goo.gl/YXkQAC. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/bhardwaj17a.html https://proceedings.mlr.press/v78/bhardwaj17a.html Learning Robot Objectives from Physical Human Interaction When humans and robots work in close proximity, physical interaction is inevitable. Traditionally, robots treat physical interaction as a disturbance, and resume their original behavior after the interaction ends. In contrast, we argue that physical human interaction is informative: it is useful information about how the robot should be doing its task. We formalize learning from such interactions as a dynamical system in which the task objective has parameters that are part of the hidden state, and physical human interactions are observations about these parameters. We derive an online approximation of the robot’s optimal policy in this system, and test it in a user study. The results suggest that learning from physical interaction leads to better robot task performance with less human effort. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/bajcsy17a.html https://proceedings.mlr.press/v78/bajcsy17a.html Deep Kernels for Optimizing Locomotion Controllers Sample efficiency is important when optimizing parameters of locomotion controllers, since hardware experiments are time consuming and expensive. Bayesian Optimization, a sample-efficient optimization framework, has recently been widely applied to address this problem, but further improvements in sample efficiency are needed for practical applicability to real-world robots and high-dimensional controllers. To address this, prior work has proposed using domain expertise for constructing custom distance metrics for locomotion. In this work we show how to learn such a distance metric automatically. We use a neural network to learn an informed distance metric from data obtained in high-fidelity simulations. We conduct experiments on two different controllers and robot architectures. First, we demonstrate improvement in sample efficiency when optimizing a 5-dimensional controller on the ATRIAS robot hardware. We then conduct simulation experiments to optimize a 16-dimensional controller for a 7-link robot model and obtain significant improvements even when optimizing in perturbed environments. This demonstrates that our approach is able to enhance sample efficiency for two different controllers, hence is a fitting candidate for further experiments on hardware in the future. Wed, 18 Oct 2017 00:00:00 +0000 https://proceedings.mlr.press/v78/antonova17a.html https://proceedings.mlr.press/v78/antonova17a.html