- title: 'Preface'
abstract: 'Some remarks from the organizers about L4DC2020.'
volume: 120
URL: https://proceedings.mlr.press/v120/bayen20a.html
PDF: http://proceedings.mlr.press/v120/bayen20a/bayen20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bayen20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 1-4
id: bayen20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 1
lastpage: 4
published: 2020-07-31 00:00:00 +0000
- title: 'Actively Learning Gaussian Process Dynamics'
abstract: 'Despite the availability of ever more data enabled through modern sensor and computer technology, it still remains an open problem to learn dynamical systems in a sample-efficient way.We propose active learning strategies that leverage information-theoretical properties arising naturally during Gaussian process regression, while respecting constraints on the sampling process imposed by the system dynamics. Sample points are selected in regions with high uncertainty, leading to exploratory behavior and data-efficient training of the model.All results are verified in an extensive numerical benchmark.'
volume: 120
URL: https://proceedings.mlr.press/v120/buisson-fenet20a.html
PDF: http://proceedings.mlr.press/v120/buisson-fenet20a/buisson-fenet20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-buisson-fenet20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mona
family: Buisson-Fenet
- given: Friedrich
family: Solowjow
- given: Sebastian
family: Trimpe
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 5-15
id: buisson-fenet20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 5
lastpage: 15
published: 2020-07-31 00:00:00 +0000
- title: 'Finite Sample System Identification: Optimal Rates and the Role of Regularization'
abstract: 'This paper studies the optimality of regularized regression for low order linear system identification. The nuclear norm of the system’s Hankel matrix is added as a regularizer to the least squares cost function due to the following advantages: (1) its easy to tune regularzation weight, (2) lower sample complexity, (3) returning a Hankel matrix with a clear singular value gap, which robustly recovers a low-order linear system from noisy output observations. Recently, the performance of unregularized least squares formulations have been studied statistically in terms of finite sample complexity and recovery error; however, no results are known for the regularized approach. In this work, we show that with the advantage of sample complexity kept, the regularized algorithm beats unregularized least squares in Hankel spectral norm bound.'
volume: 120
URL: https://proceedings.mlr.press/v120/sun20a.html
PDF: http://proceedings.mlr.press/v120/sun20a/sun20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-sun20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Yue
family: Sun
- given: Samet
family: Oymak
- given: Maryam
family: Fazel
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 16-25
id: sun20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 16
lastpage: 25
published: 2020-07-31 00:00:00 +0000
- title: 'Finite-Time Performance of Distributed Two-Time-Scale Stochastic Approximation'
abstract: 'Two-time-scale stochastic approximation is a popular iterative method for finding the solution of a system of two equations. Such methods have found broad applications in many areas, especially in machine learning and reinforcement learning. In this paper, we propose a distributed variant of this method over a network of agents, where the agents use two graphs representing their communication at different speeds due to the nature of their two-time-scale updates. Our main contribution is to provide a finite-time analysis for the performance of the proposed method. In particular, we establish an upper bound for the convergence rates of the mean square errors at the agents to zero as a function of the step sizes and the network topology. We believe that the proposed method and analysis studied in this paper can be applicable to many other interesting applications. '
volume: 120
URL: https://proceedings.mlr.press/v120/doan20a.html
PDF: http://proceedings.mlr.press/v120/doan20a/doan20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-doan20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Thinh
family: Doan
- given: Justin
family: Romberg
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 26-36
id: doan20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 26
lastpage: 36
published: 2020-07-31 00:00:00 +0000
- title: 'Virtual Reference Feedback Tuning with data-driven reference model selection'
abstract: 'In control applications where finding a model of the plant is the most costly and time consuming task, Virtual Reference Feedback Tuning (VRFT) represents a valid - purely data-driven - alternative for the design of model reference controllers. However, the selection of a proper reference model within a model-free setting is known to be a critical task, with this model typically playing the role of a hyper-parameter. In this work, we extend the VRFT methodology to compute both a proper reference model and the corresponding optimal controller parameters from data by means of Particle Swarm optimization. The effectiveness of the proposed approach is illustrated on a benchmark simulation example.'
volume: 120
URL: https://proceedings.mlr.press/v120/breschi20a.html
PDF: http://proceedings.mlr.press/v120/breschi20a/breschi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-breschi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Valentina
family: Breschi
- given: Simone
family: Formentin
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 37-45
id: breschi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 37
lastpage: 45
published: 2020-07-31 00:00:00 +0000
- title: 'Direct Data-Driven Control with Embedded Anti-Windup Compensation'
abstract: 'Input saturation is an ubiquitous nonlinearity in control systems and arises from the fact that all actuators are subject to a maximum power, thereby resulting in a hard limitation on the allowable magnitude of the input effort. In the scientific literature, anti-windup augmentation has been proposed to recover the desired linear closed-loop dynamics during transients, but the effectiveness of such a compensation is strongly linked to the accuracy of the mathematical model of the plant. In this work, it is shown that a feedback controller with embedded anti-windup compensator can be directly identified from data, by suitably extending the existing data-driven design theory. The effectiveness of the resulting method is illustrated on a benchmark simulation example.'
volume: 120
URL: https://proceedings.mlr.press/v120/breschi20b.html
PDF: http://proceedings.mlr.press/v120/breschi20b/breschi20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-breschi20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Valentina
family: Breschi
- given: Simone
family: Formentin
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 46-54
id: breschi20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 46
lastpage: 54
published: 2020-07-31 00:00:00 +0000
- title: 'Sparse and Low-bias Estimation of High Dimensional Vector Autoregressive Models'
abstract: 'Vector autoregressive (VAR) models are widely used for causal discovery and forecasting in multivariate time series analysis. In the high-dimensional setting, which is increasingly common in fields such as neuroscience and econometrics, model parameters are inferred by $L_1$-regularized maximum likelihood (RML). A well-known feature of RML inference is that in general the technique produces a trade-off between sparsity and bias that depends on the choice of the regularization hyperparameter. In the context of multivariate time series analysis, sparse estimates are favorable for causal discovery and low-bias estimates are favorable for forecasting. However, owing to a paucity of research on hyperparameter selection methods, practitioners must rely on *ad-hoc* methods such as cross-validation (or manual tuning). The particular balance that such approaches achieve between the two goals — causal discovery and forecasting — is poorly understood. Our paper investigates this behavior and proposes a method (UoI-VAR) that achieves a better balance between sparsity and bias when the underlying causal influences are in fact sparse. We demonstrate through simulation that RML with a hyperparameter selected by cross-validation tends to overfit, producing relatively dense estimates. We further demonstrate that UoI-VAR much more effectively approximates the correct sparsity pattern with only a minor compromise in model fit, particularly so for larger data dimensions, and that the estimates produced by UoI-VAR exhibit less bias. We conclude that our method achieves improved performance especially well-suited to applications involving simultaneous causal discovery and forecasting in high-dimensional settings.'
volume: 120
URL: https://proceedings.mlr.press/v120/ruiz20a.html
PDF: http://proceedings.mlr.press/v120/ruiz20a/ruiz20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-ruiz20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Trevor
family: Ruiz
- given: Sharmodeep
family: Bhattacharyya
- given: Mahesh
family: Balasubramanian
- given: Kristofer
family: Bouchard
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 55-64
id: ruiz20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 55
lastpage: 64
published: 2020-07-31 00:00:00 +0000
- title: 'Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy'
abstract: 'High fidelity behavior prediction of intelligent agents is critical in many applications. However, the prediction model trained on the training set may not generalize to the testing set due to domain shift and time variance. The challenge motivates the adoption of online adaptation algorithms to update prediction models in real-time to improve the prediction performance. Inspired by Extended Kalman Filter (EKF), this paper introduces a series of online adaptation methods, which are applicable to neural network-based models. A base adaptation algorithm Modified EKF with forgetting factor (MEKF_lambda) is introduced first, followed by exponential moving average filtering techniques. Then this paper introduces a dynamic multi-epoch update strategy to effectively utilize samples received in real time. With all these extensions, we propose a robust online adaptation algorithm: MEKF with Exponential Moving Average and Dynamic Multi-Epoch strategy (MEKF_EMA-DME). The proposed algorithm outperforms existing methods as demonstrated in experiments.'
volume: 120
URL: https://proceedings.mlr.press/v120/abuduweili20a.html
PDF: http://proceedings.mlr.press/v120/abuduweili20a/abuduweili20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-abuduweili20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Abulikemu
family: Abuduweili
- given: Changliu
family: Liu
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 65-74
id: abuduweili20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 65
lastpage: 74
published: 2020-07-31 00:00:00 +0000
- title: 'Estimating Reachable Sets with Scenario Optimization'
abstract: 'Many practical systems are not amenable to the reachability methods that give guarantees of correctness, since they have dynamics that are strongly nonlinear, uncertain, and possibly unknown. While reachable sets for these kinds of systems can still be estimated in a data-driven way, data-driven methods typically do not guarantee the validity of their results. However, certain data-driven approaches may be given a probabilistic guarantee of correctness, by reframing the problem as a chance-constrained optimization problem that is solved with scenario optimization. We apply this approach to the problem of approximating a reachable set by a norm ball from data. The method requires only O(n^2) sample trajectories and the solution of a convex problem. A variant of the method restricted to axis-aligned norm balls requires only O(n) samples.'
volume: 120
URL: https://proceedings.mlr.press/v120/devonport20a.html
PDF: http://proceedings.mlr.press/v120/devonport20a/devonport20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-devonport20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Alex
family: Devonport
- given: Murat
family: Arcak
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 75-84
id: devonport20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 75
lastpage: 84
published: 2020-07-31 00:00:00 +0000
- title: 'LSTM Neural Networks: Input to State Stability and Probabilistic Safety Verification'
abstract: 'The goal of this paper is to analyze Long Short Term Memory (LSTM) neural networks from a dynamical system perspective. The classical recursive equations describing the evolution of LSTM can be recast in state space form, resulting in a time invariant nonlinear dynamical system. In this work, a sufficient condition guaranteeing the Input-to-State (ISS) stability property of this system are provided. Then, a discussion on the verification of LSTM networks is provided; in particular, a dedicated approach based on the scenario algorithm is devised. The proposed method is eventually tested on a pH neutralization process.'
volume: 120
URL: https://proceedings.mlr.press/v120/bonassi20a.html
PDF: http://proceedings.mlr.press/v120/bonassi20a/bonassi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bonassi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Fabio
family: Bonassi
- given: Enrico
family: Terzi
- given: Marcello
family: Farina
- given: Riccardo
family: Scattolini
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 85-94
id: bonassi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 85
lastpage: 94
published: 2020-07-31 00:00:00 +0000
- title: 'Bayesian joint state and parameter tracking in autoregressive models'
abstract: 'We address the problem of online Bayesian state and parameter tracking in autoregressive (AR) models with time-varying process noise variance. The involved marginalization and expectation integrals cannot be analytically solved. Moreover, the online tracking constraint makes sampling and batch learning methods unsuitable for this problem. We propose a hybrid variational message passing algorithm that robustly tracks the time-varying dynamics of the latent states, AR coefficients and process noise variance. Since message passing in a factor graph is a highly modular inference approach, the proposed methods easily extend to other non-stationary dynamic modeling problems.'
volume: 120
URL: https://proceedings.mlr.press/v120/senoz20a.html
PDF: http://proceedings.mlr.press/v120/senoz20a/senoz20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-senoz20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Ismail
family: Senoz
- given: Albert
family: Podusenko
- given: Wouter M.
family: Kouw
- given: Bert
family: Vries
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 95-104
id: senoz20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 95
lastpage: 104
published: 2020-07-31 00:00:00 +0000
- title: 'Learning to Correspond Dynamical Systems'
abstract: 'Many dynamical systems exhibit similar structure, as often captured by hand-designed simplified models that can be used for analysis and control. We develop a method for learning to correspond pairs of dynamical systems via a learned latent dynamical system. Given trajectory data from two dynamical systems, we learn a shared latent state space and a shared latent dynamics model, along with an encoder-decoder pair for each of the original systems. With the learned correspondences in place, we can use a simulation of one system to produce an imagined motion of its counterpart. We can also simulate in the learned latent dynamics and synthesize the motions of both corresponding systems, as a form of bisimulation. We demonstrate the approach using pairs of controlled bipedal walkers, as well as by pairing a walker with a controlled pendulum.'
volume: 120
URL: https://proceedings.mlr.press/v120/kim20a.html
PDF: http://proceedings.mlr.press/v120/kim20a/kim20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-kim20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Nam Hee
family: Kim
- given: Zhaoming
family: Xie
- given: Michiel
family: Panne
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 105-117
id: kim20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 105
lastpage: 117
published: 2020-07-31 00:00:00 +0000
- title: 'Learning solutions to hybrid control problems using Benders cuts'
abstract: 'Hybrid control problems are complicated by the need to make a suitable sequence of discrete decisions related to future modes of operation of the system. Model predictive control (MPC) encodes a finite-horizon truncation of such problems as a mixed-integer program, and then imposes a cost and/or constraints on the terminal state intended to reflect all post-horizon behaviour. However, these are often ad hoc choices tuned by hand after empirically observing performance. We present a learning method that sidesteps this problem, in which the so-called N-step Q-function of the problem is approximated from below, using Benders’ decomposition. The function takes a state and a sequence of N control decisions as arguments, and therefore extends the traditional notion of a Q-function from reinforcement learning. After learning it from a training process exploring the state-input space, we use it in place of the usual MPC objective. We take an example hybrid control task and show that it can be completed successfully with a shorter planning horizon than conventional hybrid MPC thanks to our proposed method. Furthermore, we report that Q-functions trained with long horizons can be truncated to a shorter horizon for online use, yielding simpler control laws with apparently little loss of performance.'
volume: 120
URL: https://proceedings.mlr.press/v120/menta20a.html
PDF: http://proceedings.mlr.press/v120/menta20a/menta20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-menta20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Sandeep
family: Menta
- given: Joseph
family: Warrington
- given: John
family: Lygeros
- given: Manfred
family: Morari
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 118-126
id: menta20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 118
lastpage: 126
published: 2020-07-31 00:00:00 +0000
- title: 'Feed-forward Neural Networks with Trainable Delay'
abstract: 'In this paper we build a bridge between feed-forward neural networks and delayed dynamical systems. As an initial demonstration, we capture the car-following behavior of a connected automated vehicle that includes time delay by using both simulation data and experimental data. We construct a delayed feed-forward neural network (DFNN) and introduce a training algorithm in order to learn the delay. We demonstrate that this algorithm works well on the proposed structures.'
volume: 120
URL: https://proceedings.mlr.press/v120/ji20a.html
PDF: http://proceedings.mlr.press/v120/ji20a/ji20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-ji20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Xunbi A.
family: Ji
- given: Tamás G.
family: Molnár
- given: Sergei S.
family: Avedisov
- given: Gábor
family: Orosz
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 127-136
id: ji20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 127
lastpage: 136
published: 2020-07-31 00:00:00 +0000
- title: 'Exploiting Model Sparsity in Adaptive MPC: A Compressed Sensing Viewpoint'
abstract: 'This paper proposes an Adaptive Stochastic Model Predictive Control (MPC) strategy for stable linear time-invariant systems in the presence of bounded disturbances. We consider multi-input, multi-output systems that can be expressed by a Finite Impulse Response (FIR) model. The parameters of the FIR model corresponding to each output are unknown but assumed sparse. We estimate these parameters using the Recursive Least Squares algorithm. The estimates are then improved using set-based bounds obtained by solving the Basis Pursuit Denoising problem. Our approach is able to handle hard input constraints and probabilistic output constraints. Using tools from distributionally robust optimization, we reformulate the probabilistic output constraints as tractable convex second-order cone constraints, which enables us to pose our MPC design task as a convex optimization problem. The efficacy of the developed algorithm is highlighted with a thorough numerical example, where we demonstrate performance gain over the counterpart algorithm of Bujarbaruah et al. (2018), which does not utilize the sparsity information of the system impulse response parameters during control design.'
volume: 120
URL: https://proceedings.mlr.press/v120/bujarbaruah20a.html
PDF: http://proceedings.mlr.press/v120/bujarbaruah20a/bujarbaruah20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bujarbaruah20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Monimoy
family: Bujarbaruah
- given: Charlott
family: Vallon
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 137-146
id: bujarbaruah20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 137
lastpage: 146
published: 2020-07-31 00:00:00 +0000
- title: 'Structured Variational Inference in Partially Observable Unstable Gaussian Process State Space Models'
abstract: 'We propose a new variational inference algorithm for learning in Gaussian Process State-Space Models (GPSSMs). Our algorithm enables learning of unstable and partially observable systems, where previous algorithms fail. Our main algorithmic contribution is a novel approximate posterior that can be calculated efficiently using a single forward and backward pass along the training trajectories. The forward-backward pass is inspired on Kalman smoothing for linear dynamical systems but generalizes to GPSSMs. Our second contribution is a modification of the conditioning step that effectively lowers the Kalman gain. This modification is crucial to attaining good test performance where no measurements are available. Finally, we show experimentally that our learning algorithm performs well in stable and unstable real systems with hidden states.'
volume: 120
URL: https://proceedings.mlr.press/v120/curi20a.html
PDF: http://proceedings.mlr.press/v120/curi20a/curi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-curi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Sebastian
family: Curi
- given: Silvan
family: Melchior
- given: Felix
family: Berkenkamp
- given: Andreas
family: Krause
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 147-157
id: curi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 147
lastpage: 157
published: 2020-07-31 00:00:00 +0000
- title: 'Regret Bound for Safe Gaussian Process Bandit Optimization'
abstract: 'Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints. When learning algorithms are used in safety-critical systems, it is paramount that the learner’s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the system’s unknown payoff and constraint functions are sampled from Gaussian Processes (GPs). We develop a safe variant of the proposed algorithm by Srinivas et al. (2010), GP-UCB, called SGP-UCB, with necessary modifications to respect safety constraints at every round. Our most important contribution is to derive the first sub-linear regret bounds for this problem.'
volume: 120
URL: https://proceedings.mlr.press/v120/amani20a.html
PDF: http://proceedings.mlr.press/v120/amani20a/amani20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-amani20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Sanae
family: Amani
- given: Mahnoosh
family: Alizadeh
- given: Christos
family: Thrampoulidis
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 158-159
id: amani20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 158
lastpage: 159
published: 2020-07-31 00:00:00 +0000
- title: 'Smart Forgetting for Safe Online Learning with Gaussian Processes'
abstract: 'The identification of unknown dynamical systems using supervised learning enables model-based control of systems that cannot be modeled based on first principles. While most control literature focuses on the analysis of a static dataset, online learning control, where data points are added while the controller is running, has rarely been studied in depth. In this paper, we present a novel approach for online learning control based on Gaussian process models. To avoid computational difficulties with growing datasets, we propose a safe forgetting mechanism. Using an entropy criterion, data points are evaluated with respect to the future trajectory of the closed loop system and are “forgotten” if the stability of the system can further be guaranteed. The approach is evaluated in a simulation and in a robotic experiment to show its real-time capability.'
volume: 120
URL: https://proceedings.mlr.press/v120/umlauft20a.html
PDF: http://proceedings.mlr.press/v120/umlauft20a/umlauft20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-umlauft20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jonas
family: Umlauft
- given: Thomas
family: Beckers
- given: Alexandre
family: Capone
- given: Armin
family: Lederer
- given: Sandra
family: Hirche
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 160-169
id: umlauft20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 160
lastpage: 169
published: 2020-07-31 00:00:00 +0000
- title: 'Linear Antisymmetric Recurrent Neural Networks'
abstract: 'Recurrent Neural Networks (RNNs) have a form of memory where the output from a node at one timestep is fed back as input the next timestep in addition to data from the previous layer. This makes them highly suitable for timeseries analysis. However, standard RNNs have known weaknesses such as exploding/vanishing gradient and thereby struggle with a long-term memory. In this paper, we suggest a new recurrent network structure called Linear Antisymmetric RNN (LARNN). This structure is based on the numerical solution to an Ordinary Differential Equation (ODE) with stability properties resulting in a stable solution, which corresponds to long-term memory and trainability. Three different numerical methods are suggested to solve the ODE: Forward and Backward Euler and the midpoint method. The suggested structure has been implemented in Keras and several simulated datasets have been used to evaluate the performance. In the investigated cases, the LARNN performs better or similar to the Long Short Term Memory (LSTM) network which is the current state of the art for RNNs.'
volume: 120
URL: https://proceedings.mlr.press/v120/moe20a.html
PDF: http://proceedings.mlr.press/v120/moe20a/moe20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-moe20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Signe
family: Moe
- given: Filippo
family: Remonato
- given: Esten Ingar
family: Grøtli
- given: Jan Tommy
family: Gravdahl
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 170-178
id: moe20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 170
lastpage: 178
published: 2020-07-31 00:00:00 +0000
- title: 'Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence'
abstract: 'Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the $\mathcal{H}_{\infty}$-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard $\mathcal{H}_{2}$ linear control, namely, linear quadratic regulator (LQR) problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the $\mathcal{H}_{\infty}$ robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_{\infty}$ robustness, as if they are regularized by the algorithms. Furthermore, convergence to the globally optimal policies with globally sublinear and locally (super-)linear rates are provided under certain conditions, despite the nonconvexity of the problem. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.'
volume: 120
URL: https://proceedings.mlr.press/v120/zhang20a.html
PDF: http://proceedings.mlr.press/v120/zhang20a/zhang20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-zhang20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Kaiqing
family: Zhang
- given: Bin
family: Hu
- given: Tamer
family: Basar
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 179-190
id: zhang20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 179
lastpage: 190
published: 2020-07-31 00:00:00 +0000
- title: 'A Finite-Sample Deviation Bound for Stable Autoregressive Processes'
abstract: 'In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $\chi^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of any fixed linear combination of the estimated parameters of the AR$(n)$ process. We discuss extensions and limitations of our approach. '
volume: 120
URL: https://proceedings.mlr.press/v120/gonzalez20a.html
PDF: http://proceedings.mlr.press/v120/gonzalez20a/gonzalez20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-gonzalez20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Rodrigo A.
family: González
- given: Cristian R.
family: Rojas
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 191-200
id: gonzalez20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 191
lastpage: 200
published: 2020-07-31 00:00:00 +0000
- title: 'Online Data Poisoning Attacks'
abstract: 'We study data poisoning attacks in the online learning setting, where training data arrive sequentially, and the attacker is eavesdropping the data stream and has the ability to contaminate the current data point to affect the online learning process. We formulate the optimal online attack problem as a stochastic optimal control problem, and provide a systematic solution using tools from model predictive control and deep reinforcement learning. We further provide theoretical analysis on the regret suffered by the attacker for not knowing the true data sequence. Experiments validate our control approach in generating near-optimal attacks on both supervised and unsupervised learning tasks.'
volume: 120
URL: https://proceedings.mlr.press/v120/zhang20b.html
PDF: http://proceedings.mlr.press/v120/zhang20b/zhang20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-zhang20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Xuezhou
family: Zhang
- given: Xiaojin
family: Zhu
- given: Laurent
family: Lessard
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 201-210
id: zhang20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 201
lastpage: 210
published: 2020-07-31 00:00:00 +0000
- title: 'Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot'
abstract: 'Model Predictive Control (MPC) is a powerful control technique that handles constraints, takes the system’s dynamics into account, and is optimal with respect to a given cost function. In practice, however, it often requires an expert to craft and tune this cost function and find trade-offs between different state penalties to satisfy simple high level objectives. In this paper, we use Reinforcement Learning and in particular value learning to approximate the value function given only high level objectives, which can be sparse and binary. Building upon previous works, we present improvements that allowed us to successfully deploy the method on a real world unmanned ground vehicle. Our experiments show that our method can learn the cost function from scratch and without human intervention, while reaching a performance level similar to that of an expert-tuned MPC. We perform a quantitative comparison of these methods with standard MPC approaches both in simulation and on the real robot.'
volume: 120
URL: https://proceedings.mlr.press/v120/karnchanachari20a.html
PDF: http://proceedings.mlr.press/v120/karnchanachari20a/karnchanachari20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-karnchanachari20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Napat
family: Karnchanachari
- given: Miguel
family: Iglesia Valls
- given: David
family: Hoeller
- given: Marco
family: Hutter
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 211-224
id: karnchanachari20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 211
lastpage: 224
published: 2020-07-31 00:00:00 +0000
- title: 'Learning Constrained Dynamics with Gauss’ Principle adhering Gaussian Processes'
abstract: 'The identification of the constrained dynamics of mechanical systems is often challenging. Learning methods promise to ease an analytical analysis, but require considerable amounts of data for training. We propose to combine insights from analytical mechanics with Gaussian process regression to improve the model’s data efficiency and constraint integrity. The result is a Gaussian process model that incorporates a priori constraint knowledge such that its predictions adhere Gauss’ principle of least constraint. In return, predictions of the system’s acceleration naturally respect potentially non-ideal (non-)holonomic equality constraints. As corollary results, our model enables to infer the acceleration of the unconstrained system from data of the constrained system and enables knowledge transfer between differing constraint configurations. '
volume: 120
URL: https://proceedings.mlr.press/v120/geist20a.html
PDF: http://proceedings.mlr.press/v120/geist20a/geist20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-geist20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Andreas
family: Geist
- given: Sebastian
family: Trimpe
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 225-234
id: geist20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 225
lastpage: 234
published: 2020-07-31 00:00:00 +0000
- title: 'Counterfactual Programming for Optimal Control'
abstract: 'In recent years, considerable work has been done to tackle the issue of designing control laws based on observations to allow unknown dynamical systems to perform pre-specified tasks. At least as important for autonomy, however, is the issue of learning which tasks can be performed in the first place. This is particularly critical in situations where multiple (possibly conflicting) tasks and requirements are demanded from the agent, resulting in infeasible specifications. Such situations arise due to over-specification or dynamic operating conditions and are only aggravated when the dynamical system model is learned through simulations. Often, these issues are tackled using regularization and penalties tuned based on application-specific expert knowledge. Nevertheless, this solution becomes impractical for large-scale systems, unknown operating conditions, and/or in online settings where expert input would be needed during the system operation. Instead, this work enables agents to autonomously pose, tune, and solve optimal control problems by compromising between performance and specification costs. Leveraging duality theory, it puts forward a counterfactual optimization algorithm that directly determines the specification trade-off while solving the optimal control problem. '
volume: 120
URL: https://proceedings.mlr.press/v120/chamon20a.html
PDF: http://proceedings.mlr.press/v120/chamon20a/chamon20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-chamon20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Luiz F. O.
family: Chamon
- given: Santiago
family: Paternain
- given: Alejandro
family: Ribeiro
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 235-244
id: chamon20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 235
lastpage: 244
published: 2020-07-31 00:00:00 +0000
- title: 'Learning Navigation Costs from Demonstrations with Semantic Observations'
abstract: 'This paper focuses on inverse reinforcement learning (IRL) for autonomous robot navigation using semantic observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert’s observations and state-control trajectory. We develop a map encoder, which infers semantic class probabilities from the observation sequence, and a cost encoder, defined as deep neural network over the semantic features. Since the expert cost is not directly ob-servable, the representation parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. The error is optimized using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of cars, sidewalks and road lanes.'
volume: 120
URL: https://proceedings.mlr.press/v120/wang20a.html
PDF: http://proceedings.mlr.press/v120/wang20a/wang20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-wang20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Tianyu
family: Wang
- given: Vikas
family: Dhiman
- given: Nikolay
family: Atanasov
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 245-255
id: wang20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 245
lastpage: 255
published: 2020-07-31 00:00:00 +0000
- title: 'Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems'
abstract: 'We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$, with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network. '
volume: 120
URL: https://proceedings.mlr.press/v120/qu20a.html
PDF: http://proceedings.mlr.press/v120/qu20a/qu20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-qu20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Guannan
family: Qu
- given: Adam
family: Wierman
- given: Na
family: Li
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 256-266
id: qu20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 256
lastpage: 266
published: 2020-07-31 00:00:00 +0000
- title: 'Black-box continuous-time transfer function estimation with stability guarantees: a kernel-based approach'
abstract: 'Continuous-time parametric models of dynamical systems are usually preferred given their physical interpretation. When there is a lack of prior physical knowledge, the user is faced with the model selection issue. In this paper, we propose a non-parametric approach to estimate a continuous-time stable linear model from data, while automatically selecting a proper structure of the transfer function and guaranteeing to preserve the system stability properties. Results show how the proposed approach outperforms the state of the art.'
volume: 120
URL: https://proceedings.mlr.press/v120/mazzoleni20a.html
PDF: http://proceedings.mlr.press/v120/mazzoleni20a/mazzoleni20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-mazzoleni20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mirko
family: Mazzoleni
- given: Matteo
family: Scandella
- given: Simone
family: Formentin
- given: Fabio
family: Previdi
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 267-276
id: mazzoleni20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 267
lastpage: 276
published: 2020-07-31 00:00:00 +0000
- title: 'Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization'
abstract: 'Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for planning a sequence of actions. To decide on an action to take, CEM conducts a search for the action sequence with the highest return according to the learned dynamics model and reward. Action sequences are typically randomly sampled from an unconditional Gaussian distribution and evaluated. This distribution is iteratively updated towards action sequences with higher returns. However, sampling and simulating unconditional action sequences can be very inefficient (especially from a diagonal Gaussian distribution and for high dimensional action spaces). An alternative line of approaches optimize action sequences directly via gradient descent but are prone to local optima. We propose a method to solve this planning problem by interleaving CEM and gradient descent steps in optimizing the action sequence.'
volume: 120
URL: https://proceedings.mlr.press/v120/bharadhwaj20a.html
PDF: http://proceedings.mlr.press/v120/bharadhwaj20a/bharadhwaj20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bharadhwaj20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Homanga
family: Bharadhwaj
- given: Kevin
family: Xie
- given: Florian
family: Shkurti
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 277-286
id: bharadhwaj20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 277
lastpage: 286
published: 2020-07-31 00:00:00 +0000
- title: 'Learning the Globally Optimal Distributed LQ Regulator'
abstract: 'We study model-free learning methods for the output-feedback Linear Quadratic (LQ) control problem in finite-horizon subject to subspace constraints on the control policy. Subspace constraints naturally arise in the field of distributed control and present a significant challenge in the sense that standard model-based optimization and learning leads to intractable numerical programs in general. Building upon recent results in zeroth-order optimization, we establish model-free sample-complexity bounds for the class of distributed LQ problems where a local gradient dominance constant exists on any sublevel set of the cost function. We prove that a fundamental class of distributed control problems - commonly referred to as Quadratically Invariant (QI) problems - as well as others possess this property. To the best of our knowledge, our result is the first sample-complexity bound guarantee on learning globally optimal distributed output-feedback control policies. '
volume: 120
URL: https://proceedings.mlr.press/v120/furieri20a.html
PDF: http://proceedings.mlr.press/v120/furieri20a/furieri20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-furieri20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Luca
family: Furieri
- given: Yang
family: Zheng
- given: Maryam
family: Kamgarpour
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 287-297
id: furieri20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 287
lastpage: 297
published: 2020-07-31 00:00:00 +0000
- title: 'VarNet: Variational Neural Networks for the Solution of Partial Differential Equations'
abstract: 'We propose a new model-based unsupervised learning method, called VarNet, for the solution of partial differential equations (PDEs) using deep neural networks. Particularly, we propose a novel loss function that relies on the variational (integral) form of PDEs as apposed to their differential form which is commonly used in the literature. Our loss function is discretization-free, highly parallelizable, and more effective in capturing the solution of PDEs since it employs lower-order derivatives and trains over measure non-zero regions of space-time. The models obtained using VarNet are smooth and do not require interpolation. They are also easily differentiable and can directly be used for control and optimization of PDEs. Finally, VarNet can straight-forwardly incorporate parametric PDE models making it a natural tool for model order reduction of PDEs.'
volume: 120
URL: https://proceedings.mlr.press/v120/khodayi-mehr20a.html
PDF: http://proceedings.mlr.press/v120/khodayi-mehr20a/khodayi-mehr20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-khodayi-mehr20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Reza
family: Khodayi-Mehr
- given: Michael
family: Zavlanos
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 298-307
id: khodayi-mehr20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 298
lastpage: 307
published: 2020-07-31 00:00:00 +0000
- title: 'Tractable Reinforcement Learning of Signal Temporal Logic Objectives'
abstract: 'Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.'
volume: 120
URL: https://proceedings.mlr.press/v120/venkataraman20a.html
PDF: http://proceedings.mlr.press/v120/venkataraman20a/venkataraman20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-venkataraman20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Harish
family: Venkataraman
- given: Derya
family: Aksaray
- given: Peter
family: Seiler
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 308-317
id: venkataraman20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 308
lastpage: 317
published: 2020-07-31 00:00:00 +0000
- title: 'A Spatially and Temporally Attentive Joint Trajectory Prediction Framework for Modeling Vessel Intent'
abstract: 'Ships, or vessels, often sail in and out of cluttered environments over the course of their trajectories. Safe navigation in such cluttered scenarios requires an accurate estimation of the intent of neighboring vessels and their effect on the self and vice-versa well into the future. In manned vessels, this is achieved by constant communication between people on board, nautical experience, and audio and visual signals. In this paper we propose a deep neural network based architecture to predict intent of neighboring vessels into the future for an unmanned vessel solely based on positional data.'
volume: 120
URL: https://proceedings.mlr.press/v120/sekhon20a.html
PDF: http://proceedings.mlr.press/v120/sekhon20a/sekhon20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-sekhon20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jasmine
family: Sekhon
- given: Cody
family: Fleming
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 318-327
id: sekhon20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 318
lastpage: 327
published: 2020-07-31 00:00:00 +0000
- title: 'Structured Mechanical Models for Robot Learning and Control'
abstract: 'Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer from data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechanical systems that are data-efficient, easily amenable to prior knowledge, and easily usable with model-based control techniques. The goal of this work is to demonstrate the benefits of using Structured Mechanical Models in lieu of black-box neural networks when modeling robot dynamics. We demonstrate that they generalize better from limited data and yield more reliable model-based controllers on a variety of simulated robotic domains.'
volume: 120
URL: https://proceedings.mlr.press/v120/gupta20a.html
PDF: http://proceedings.mlr.press/v120/gupta20a/gupta20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-gupta20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jayesh K.
family: Gupta
- given: Kunal
family: Menda
- given: Zachary
family: Manchester
- given: Mykel
family: Kochenderfer
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 328-337
id: gupta20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 328
lastpage: 337
published: 2020-07-31 00:00:00 +0000
- title: 'Data-driven Identification of Approximate Passive Linear Models for Nonlinear Systems'
abstract: 'In model-based learning, it is desirable for the learned model to preserve structural properties of the system that may facilitate easier control design or provide performance, stability or safety guarantees. Here, we consider an unknown nonlinear system possessing such a structural property - passivity, that can be used to ensure robust stability with a learned controller. We present an algorithm to learn a passive linear model of this nonlinear system from time domain input-output data. We first learn an approximate linear model of this system using any standard system identification technique. We then enforce passivity by perturbing the system matrices of the linear model, while ensuring that the perturbed model closely approximates the input-output behavior of the nonlinear system. Finally, we derive a trade-off between the perturbation size and the radius of the region in which the passivity of the linear model guarantees local passivity of the unknown nonlinear system. '
volume: 120
URL: https://proceedings.mlr.press/v120/sivaranjani20a.html
PDF: http://proceedings.mlr.press/v120/sivaranjani20a/sivaranjani20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-sivaranjani20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: S.
family: Sivaranjani
- given: Etika
family: Agarwal
- given: Vijay
family: Gupta
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 338-339
id: sivaranjani20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 338
lastpage: 339
published: 2020-07-31 00:00:00 +0000
- title: 'Constraint Management for Batch Processes Using Iterative Learning Control and Reference Governors'
abstract: ' This paper provides a novel combination of Reference Governors (RG) and Iterative Learning Control (ILC) to address the issue of simultaneous learning and constraint management in systems that perform a task repeatedly. The proposed control strategy leverages the measured output from the previous iterations to improve tracking, while guaranteeing constraint satisfaction during the learning process. To achieve this, the system is modeled by a linear system with polytopic uncertainties. An RG solution based on a robust Maximal Admissable Set (MAS) is proposed that endows the ILC algorithm with constraint management capabilities. An update law on the MAS is proposed to further improve performance.'
volume: 120
URL: https://proceedings.mlr.press/v120/laracy20a.html
PDF: http://proceedings.mlr.press/v120/laracy20a/laracy20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-laracy20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Aidan
family: Laracy
- given: Hamid
family: Ossareh
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 340-349
id: laracy20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 340
lastpage: 349
published: 2020-07-31 00:00:00 +0000
- title: 'Robust Guarantees for Perception-Based Control'
abstract: 'Motivated by vision-based control of autonomous vehicles, we consider the problem of controlling a known linear dynamical system for which partial state information, such as vehicle position, is extracted from complex and nonlinear data, such as a camera image. Our approach is to use a learned perception map that predicts some linear function of the state and to design a corresponding safe set and robust controller for the closed loop system with this sensing scheme. We show that under suitable smoothness assumptions on both the perception map and the generative model relating state to complex and nonlinear data, parameters of the safe set can be learned via appropriately dense sampling of the state space. We then prove that the resulting perception-control loop has favorable generalization properties. We illustrate the usefulness of our approach on a synthetic example and on the self-driving car simulation platform CARLA.'
volume: 120
URL: https://proceedings.mlr.press/v120/dean20a.html
PDF: http://proceedings.mlr.press/v120/dean20a/dean20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-dean20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Sarah
family: Dean
- given: Nikolai
family: Matni
- given: Benjamin
family: Recht
- given: Vickie
family: Ye
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 350-360
id: dean20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 350
lastpage: 360
published: 2020-07-31 00:00:00 +0000
- title: 'Learning Convex Optimization Control Policies'
abstract: 'Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://web.stanford.edu/ boyd/papers/learning_cocps.html.'
volume: 120
URL: https://proceedings.mlr.press/v120/agrawal20a.html
PDF: http://proceedings.mlr.press/v120/agrawal20a/agrawal20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-agrawal20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Akshay
family: Agrawal
- given: Shane
family: Barratt
- given: Stephen
family: Boyd
- given: Bartolomeo
family: Stellato
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 361-373
id: agrawal20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 361
lastpage: 373
published: 2020-07-31 00:00:00 +0000
- title: 'Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint'
abstract: 'We consider the problem of learning a linear control policy for a linear dynamical system, from demonstrations of an expert regulating the system. The standard approach to this problem is (linear) policy fitting, which fits a linear policy by minimizing a loss function between the demonstrations and the policy’s outputs plus a regularization function that encodes prior knowledge. Despite its simplicity, this method fails to learn policies with low or even finite cost when there are few demonstrations. We propose to add an additional constraint to the regularization function in policy fitting, that the policy is the solution to some LQR problem, i.e., optimal in the stochastic control sense for some choice of quadratic cost. We refer to this constraint as a Kalman constraint. Policy fitting with a Kalman constraint requires solving an optimization problem with convex cost and bilinear constraints. We propose a heuristic method, based on the alternating direction method of multipliers (ADMM), to approximately solve this problem. An illustrative numerical experiment demonstrates that adding the Kalman constraint allows us to learn good, i.e., low cost, policies even when very few data are available.'
volume: 120
URL: https://proceedings.mlr.press/v120/palan20a.html
PDF: http://proceedings.mlr.press/v120/palan20a/palan20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-palan20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Malayandi
family: Palan
- given: Shane
family: Barratt
- given: Alex
family: McCauley
- given: Dorsa
family: Sadigh
- given: Vikas
family: Sindhwani
- given: Stephen
family: Boyd
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 374-383
id: palan20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 374
lastpage: 383
published: 2020-07-31 00:00:00 +0000
- title: 'Universal Simulation of Stable Dynamical Systems by Recurrent Neural Nets'
abstract: 'It is well-known that continuous-time recurrent neural nets are universal approximators for continuous-time dynamical systems. However, existing results provide approximation guarantees only for finite-time trajectories. In this work, we show that infinite-time trajectories generated by dynamical systems that are stable in a certain sense can be reproduced arbitrarily accurately by recurrent neural nets. For a subclass of these stable systems, we provide quantitative estimates on the sufficient number of neurons needed to achieve a specified error tolerance. '
volume: 120
URL: https://proceedings.mlr.press/v120/hanson20a.html
PDF: http://proceedings.mlr.press/v120/hanson20a/hanson20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-hanson20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Joshua
family: Hanson
- given: Maxim
family: Raginsky
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 384-392
id: hanson20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 384
lastpage: 392
published: 2020-07-31 00:00:00 +0000
- title: 'Contracting Implicit Recurrent Neural Networks: Stable Models with Improved Trainability'
abstract: 'Stability of recurrent models is closely linked with trainability, generalizability and in some applications,safety. Methods that train stable recurrent neural networks, however, do so at a significant cost to expressibility. We propose an implicit model structure that allows for a convex parametrization of stable models using contraction analysis of non-linear systems. Using these stability conditions we propose a new approach to model initialization and then provide a number of empirical results comparing the performance of our proposed model set to previous stable RNNs and vanilla RNNs. By carefully controlling stability in the model, we observe a significant increase in the speed of training and model performance.'
volume: 120
URL: https://proceedings.mlr.press/v120/revay20a.html
PDF: http://proceedings.mlr.press/v120/revay20a/revay20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-revay20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Max
family: Revay
- given: Ian
family: Manchester
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 393-403
id: revay20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 393
lastpage: 403
published: 2020-07-31 00:00:00 +0000
- title: 'On the Robustness of Data-Driven Controllers for Linear Systems'
abstract: 'This paper proposes a new framework and several results to quantify the performance of data-driven state-feedback controllers for linear systems against targeted perturbations of the training data. We focus on the case where subsets of the training data are randomly corrupted by an adversary, and derive lower and upper bounds for the stability of the closed-loop system with compromised controller as a function of the perturbation statistics, size of the training data, sensitivity of the data-driven algorithm to perturbation of the training data, and properties of the nominal closed-loop system. Our stability and convergence bounds are probabilistic in nature, and rely on a first-order approximation of the data-driven procedure that designs the state-feedback controller, which can be computed directly using the training data. We illustrate our findings via multiple numerical studies.'
volume: 120
URL: https://proceedings.mlr.press/v120/anguluri20a.html
PDF: http://proceedings.mlr.press/v120/anguluri20a/anguluri20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-anguluri20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Rajasekhar
family: Anguluri
- given: Abed Alrahman Al
family: Makdah
- given: Vaibhav
family: Katewa
- given: Fabio
family: Pasqualetti
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 404-412
id: anguluri20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 404
lastpage: 412
published: 2020-07-31 00:00:00 +0000
- title: 'Faster saddle-point optimization for solving large-scale Markov decision processes'
abstract: 'We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization problem by characterizing the conditions necessary for convergence to the optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are independent of the size of the state space. Notably, our characterization points out some potential issues with previous work.'
volume: 120
URL: https://proceedings.mlr.press/v120/serrano20a.html
PDF: http://proceedings.mlr.press/v120/serrano20a/serrano20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-serrano20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Joan Bas
family: Serrano
- given: Gergely
family: Neu
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 413-423
id: serrano20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 413
lastpage: 423
published: 2020-07-31 00:00:00 +0000
- title: 'On Simulation and Trajectory Prediction with Gaussian Process Dynamics'
abstract: 'Established techniques for simulation and prediction with Gaussian process (GP) dynamics implicitly make use of an independence assumption on successive function evaluations of the dynamics model. This can result in significant error and underestimation of the prediction uncertainty, potentially leading to failures in safety-critical applications. This paper proposes methods that explicitly take the correlation of successive function evaluations into account. We first describe two sampling-based techniques; one approach provides samples of the true trajectory distribution, suitable for ‘ground truth’ simulations, while the other draws function samples from basis function approximations of the GP. Second, we present a linearization-based technique that directly provides approximations of the trajectory distribution, taking correlations explicitly into account. We demonstrate the procedures in simple numerical examples, contrasting the results with established methods.'
volume: 120
URL: https://proceedings.mlr.press/v120/hewing20a.html
PDF: http://proceedings.mlr.press/v120/hewing20a/hewing20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-hewing20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Lukas
family: Hewing
- given: Elena
family: Arcari
- given: Lukas P.
family: Fröhlich
- given: Melanie N.
family: Zeilinger
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 424-434
id: hewing20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 424
lastpage: 434
published: 2020-07-31 00:00:00 +0000
- title: 'Sample Complexity of Kalman Filtering for Unknown Systems'
abstract: 'In this paper, we consider the task of designing a Kalman Filter (KF) for an unknown and partially observed autonomous linear time invariant system driven by process and sensor noise. To do so, we propose studying the following two step process: first, using system identification tools rooted in subspace methods, we obtain coarse finite-data estimates of the state-space parameters, and Kalman gain describing the autonomous system; and second, we use these approximate parameters to design a filter which produces estimates of the system state. We show that when the system identification step produces sufficiently accurate estimates, or when the underlying true KF is sufficiently robust, that a Certainty Equivalent (CE) KF, i.e., one designed using the estimated parameters directly, enjoys provable sub-optimality guarantees. We further show that when these conditions fail, and in particular, when the CE KF is marginally stable (i.e., has eigenvalues very close to the unit circle), that imposing additional robustness constraints on the filter leads to similar sub-optimality guarantees. We further show that with high probability, both the CE and robust filters have mean prediction error bounded by the order of inverse square root of N, where N is the number of data points collected in the system identification step. To the best of our knowledge, these are the first end-to-end sample complexity bounds for the Kalman Filtering of an unknown system. '
volume: 120
URL: https://proceedings.mlr.press/v120/tsiamis20a.html
PDF: http://proceedings.mlr.press/v120/tsiamis20a/tsiamis20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-tsiamis20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Anastasios
family: Tsiamis
- given: Nikolai
family: Matni
- given: George
family: Pappas
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 435-444
id: tsiamis20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 435
lastpage: 444
published: 2020-07-31 00:00:00 +0000
- title: 'NeurOpt: Neural network based optimization for building energy management and climate control'
abstract: 'Model predictive control (MPC) can provide significant energy cost savings in building operations in the form of energy-efficient control with better occupant comfort, lower peak demand charges, and risk-free participation in demand response. However, the engineering effort required to obtain physics-based models of buildings for MPC is considered to be the biggest bottleneck in making MPC scalable to real buildings. In this paper, we propose a data-driven control algorithm based on neural networks to reduce this cost of model identification. Our approach does not require building domain expertise or retrofitting of the existing heating and cooling systems. We validate our learning and control algorithms on a two-story building with 10 independently controlled zones, located in Italy. We learn dynamical models of energy consumption and zone temperatures with high accuracy and demonstrate energy savings and better occupant comfort compared to the default system controller.'
volume: 120
URL: https://proceedings.mlr.press/v120/jain20a.html
PDF: http://proceedings.mlr.press/v120/jain20a/jain20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-jain20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Achin
family: Jain
- given: Francesco
family: Smarra
- given: Enrico
family: Reticcioli
- given: Alessandro
family: D’Innocenzo
- given: Manfred
family: Morari
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 445-454
id: jain20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 445
lastpage: 454
published: 2020-07-31 00:00:00 +0000
- title: 'Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling'
abstract: 'Tight performance specifications in combination with operational constraints make model predictive control (MPC) the method of choice in various industries. As the performance of an MPC controller depends on a sufficiently accurate objective and prediction model of the process, a significant effort in the MPC design procedure is dedicated to modeling and identification. Driven by the increasing amount of available system data and advances in the field of machine learning, data-driven MPC techniques have been developed to facilitate the MPC controller design. While these methods are able to leverage available data, they typically do not provide principled mechanisms to automatically trade off exploitation of available data and exploration to improve and update the objective and prediction model. To this end, we present a learning-based MPC formulation using posterior sampling techniques, which provides finite-time regret bounds on the learning performance while being simple to implement using off-the-shelf MPC software and algorithms. The performance analysis of the method is based on posterior sampling theory and its practical efficiency is illustrated using a numerical example of a highly nonlinear dynamical car-trailer system.'
volume: 120
URL: https://proceedings.mlr.press/v120/wabersich20a.html
PDF: http://proceedings.mlr.press/v120/wabersich20a/wabersich20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-wabersich20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Kim Peter
family: Wabersich
- given: Melanie
family: Zeilinger
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 455-464
id: wabersich20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 455
lastpage: 464
published: 2020-07-31 00:00:00 +0000
- title: 'Parameter Optimization for Learning-based Control of Control-Affine Systems'
abstract: 'Supervised machine learning is often applied to identify system dynamics where first principle methods fail. When combining learning with control methods, probabilistic regression is typically applied to increase robustness against learning errors and analyze the stability of the closed-loop system. Although this approach allows to formulate performance guarantees for many control techniques, the obtained bounds are usually conservative, and cannot be employed for efficient control parameter tuning. Therefore, we reformulate the parameter tuning problem using robust optimization with performance constraints based on Lyapunov theory. By relaxing the problem through scenario optimization we derive a provably optimal method for control parameter tuning. We demonstrate its flexibility and efficiency on parameter tuning problems for a feedback linearizing and a computed torque controller.'
volume: 120
URL: https://proceedings.mlr.press/v120/lederer20a.html
PDF: http://proceedings.mlr.press/v120/lederer20a/lederer20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-lederer20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Armin
family: Lederer
- given: Alexandre
family: Capone
- given: Sandra
family: Hirche
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 465-475
id: lederer20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 465
lastpage: 475
published: 2020-07-31 00:00:00 +0000
- title: 'Riccati updates for online linear quadratic control'
abstract: 'We study an online setting of the linear quadratic Gaussian optimal control problem on a sequence of cost functions, where similar to classical online optimization, the future decisions are made by only knowing the cost in hindsight. We introduce a modified online Riccati update that under some boundedness assumptions, leads to logarithmic regret bounds, improving the best known square-root bound. In particular, for the scalar case we achieve the logarithmic regret without any boundedness assumption. As opposed to earlier work, proposed method does not rely on solving semi-definite programs at each stage.'
volume: 120
URL: https://proceedings.mlr.press/v120/akbari20a.html
PDF: http://proceedings.mlr.press/v120/akbari20a/akbari20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-akbari20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mohammad
family: Akbari
- given: Bahman
family: Gharesifard
- given: Tamas
family: Linder
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 476-485
id: akbari20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 476
lastpage: 485
published: 2020-07-31 00:00:00 +0000
- title: 'A Theoretical Analysis of Deep Q-Learning'
abstract: 'Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives. In specific, we focus on the fitted Q iteration (FQI) algorithm with deep neural networks, which is a slight simplification of DQN that captures the tricks of experience replay and target network used in DQN. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by FQI. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of DQN, we propose the Minimax-DQN algorithm for zero-sum Markov game with two players, which is deferred to the appendix due to space limitations.'
volume: 120
URL: https://proceedings.mlr.press/v120/yang20a.html
PDF: http://proceedings.mlr.press/v120/yang20a/yang20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-yang20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jianqing
family: Fan
- given: Zhaoran
family: Wang
- given: Yuchen
family: Xie
- given: Zhuoran
family: Yang
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 486-489
id: yang20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 486
lastpage: 489
published: 2020-07-31 00:00:00 +0000
- title: 'Localized active learning of Gaussian process state space models'
abstract: 'In learning based methods for dynamical systems, exploration plays a crucial role, as accurate models of the dynamics need to be learned. Most of the tools developed so far focus on a proper exploration-exploitation trade-off to solve the given task, or actively strive for unknown areas of the task space. However, in the latter case, the exploration is performed greedily, and fails to capture the effect that learning in the near future will have on model uncertainty in the distant future, effectively steering the system towards exploratory trajectories that yield little information. In this paper, we provide an information theory-based model predictive control method that anticipates the learning effect when exploring dynamical systems, and steers the system towards the most informative points. We employ a Gaussian process to model the system dynamics, which enables us to quantify the model uncertainty and estimate future information gains. We include a numerical example illustrates that illustrates the effectiveness of the proposed approach.'
volume: 120
URL: https://proceedings.mlr.press/v120/capone20a.html
PDF: http://proceedings.mlr.press/v120/capone20a/capone20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-capone20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Alexandre
family: Capone
- given: Gerrit
family: Noske
- given: Jonas
family: Umlauft
- given: Thomas
family: Beckers
- given: Armin
family: Lederer
- given: Sandra
family: Hirche
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 490-499
id: capone20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 490
lastpage: 499
published: 2020-07-31 00:00:00 +0000
- title: 'Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi Reachability'
abstract: 'In Bansal et al. (2019), a novel visual navigation framework that combines learning-based and model-based approaches has been proposed. Specifically, a Convolutional Neural Network (CNN) predicts a waypoint that is used by the dynamics model for planning and tracking a trajectory to the waypoint. However, the CNN inevitably makes prediction errors, ultimately leading to collisions, especially when the robot is navigating through cluttered and tight spaces. In this paper, we present a novel Hamilton-Jacobi (HJ) reachability-based method to generate supervision for the CNN for waypoint prediction. By modeling the prediction error of the CNN as disturbances in dynamics, the proposed method generates waypoints that are robust to these disturbances, and consequently to the prediction errors. Moreover, using globally optimal HJ reachability analysis leads to predicting waypoints that are time-efficient and do not exhibit greedy behavior. Through simulations and experiments on a hardware testbed, we demonstrate the advantages of the proposed approach for navigation tasks where the robot needs to navigate through cluttered, narrow indoor environments.'
volume: 120
URL: https://proceedings.mlr.press/v120/li20a.html
PDF: http://proceedings.mlr.press/v120/li20a/li20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-li20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Anjian
family: Li
- given: Somil
family: Bansal
- given: Georgios
family: Giovanis
- given: Varun
family: Tolani
- given: Claire
family: Tomlin
- given: Mo
family: Chen
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 500-510
id: li20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 500
lastpage: 510
published: 2020-07-31 00:00:00 +0000
- title: 'Learning supported Model Predictive Control for Tracking of Periodic References'
abstract: 'Increased autonomy of controllers in tasks with uncertainties stemming from the interaction with the environment can be achieved by incorporation of learning. Examples are control tasks where the system should follow a reference which depends on measurement data from surrounding systems as e.g. humans or other control systems. We propose a learning strategy for Gaussian processes to model, filter and predict references for control systems under model predictive control. Hereby constraints in the learning are included to achieve safety guarantees as trackability and recursive feasibility. An illustrative simulation example for motion compensation is given which shows performance improvements of combined constrained learning and predictive control besides the provided guarantees.'
volume: 120
URL: https://proceedings.mlr.press/v120/matschek20a.html
PDF: http://proceedings.mlr.press/v120/matschek20a/matschek20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-matschek20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Janine
family: Matschek
- given: Rolf
family: Findeisen
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 511-520
id: matschek20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 511
lastpage: 520
published: 2020-07-31 00:00:00 +0000
- title: 'Data-driven distributionally robust LQR with multiplicative noise'
abstract: 'We present a data-driven method for solving the linear quadratic regulator problem for systems with multiplicative disturbances, the distribution of which is only known through sample estimates. We adopt a distributionally robust approach to cast the controller synthesis problem as semidefinite programs. Using results from high dimensional statistics, the proposed methodology ensures that their solution provides mean-square stabilizing controllers with high probability even for low sample sizes. As sample size increases the closed-loop cost approaches that of the optimal controller produced when the distribution is known. We demonstrate the practical applicability and performance of the method through a numerical experiment.'
volume: 120
URL: https://proceedings.mlr.press/v120/coppens20a.html
PDF: http://proceedings.mlr.press/v120/coppens20a/coppens20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-coppens20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Peter
family: Coppens
- given: Mathijs
family: Schuurmans
- given: Panagiotis
family: Patrinos
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 521-530
id: coppens20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 521
lastpage: 530
published: 2020-07-31 00:00:00 +0000
- title: 'Learning the model-free linear quadratic regulator via random search'
abstract: 'Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we examine the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving $\epsilon$-accuracy in a model-free setup and the total number of function evaluations are both of $O (\log \, (1/\epsilon) )$.'
volume: 120
URL: https://proceedings.mlr.press/v120/mohammadi20a.html
PDF: http://proceedings.mlr.press/v120/mohammadi20a/mohammadi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-mohammadi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Hesameddin
family: Mohammadi
- given: Mihailo R.
family: Jovanovic’
- given: Mahdi
family: Soltanolkotabi
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 531-539
id: mohammadi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 531
lastpage: 539
published: 2020-07-31 00:00:00 +0000
- title: 'Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence'
abstract: 'Abstract dynamic programming models are used to analyze $\lambda$-policy iteration with randomization algorithms. Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the $\lambda$-operator plays a central role in the algorithm. The operator is known to be well-posed for problems with finite states, but our analysis shows that it is also well-defined for the contractive models with infinite states studied. Similarly, the algorithm we analyze is known to converge for problems with finite policies, but we identify the conditions required to guarantee convergence with probability one when the policy space is infinite regardless of the number of states. Guided by the analysis, we exemplify a data-driven approximated implementation of the algorithm for estimation of optimal costs of constrained linear and nonlinear control problems. Numerical results indicate potentials of this method in practice. '
volume: 120
URL: https://proceedings.mlr.press/v120/li20b.html
PDF: http://proceedings.mlr.press/v120/li20b/li20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-li20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Yuchao
family: Li
- given: Karl Henrik
family: Johansson
- given: Jonas
family: Mårtensson
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 540-549
id: li20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 540
lastpage: 549
published: 2020-07-31 00:00:00 +0000
- title: 'Optimistic robust linear quadratic dual control'
abstract: 'Recent work by Mania et al. has proved that certainty equivalent control achieves optimal regret for linear systems and quadratic costs. However, when parameter uncertainty is large, certainty equivalence cannot be relied upon to stabilize the true, unknown system. In this paper, we present a dual control strategy that attempts to combine the performance of certainty equivalence, with the practical utility of robustness. The formulation preserves structure in the representation of parametric uncertainty, which allows the controller to target reduction of uncertainty in the parameters that ‘matter most’ for the control task, while robustly stabilizing the uncertain system. Control synthesis proceeds via convex optimization, and the method is illustrated on a numerical example.'
volume: 120
URL: https://proceedings.mlr.press/v120/umenberger20a.html
PDF: http://proceedings.mlr.press/v120/umenberger20a/umenberger20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-umenberger20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jack
family: Umenberger
- given: Thomas B.
family: Schön
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 550-560
id: umenberger20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 550
lastpage: 560
published: 2020-07-31 00:00:00 +0000
- title: 'Bayesian Learning with Adaptive Load Allocation Strategies'
abstract: 'We study a Bayesian learning dynamics induced by agents who repeatedly allocate loads on a set of resources based on their belief of an unknown parameter that affects the cost distributions of resources. In each step, belief update is performed according to Bayes’ rule using the agents’ current load and a realization of costs on resources that they utilized. Then, agents choose a new load using an adaptive strategy update rule that accounts for their preferred allocation based on the updated belief. We prove that beliefs and loads generated by this learning dynamics converge almost surely. The convergent belief accurately estimates cost distributions of resources that are utilized by the convergent load. We establish conditions on the initial load and strategy updates under which the cost estimation is accurate on all resources. These results apply to Bayesian learning in congestion games with unknown latency functions. Particularly, we provide conditions under which the load converges to an equilibrium or socially optimal load with complete information of cost parameter. We also design an adaptive tolling mechanism that eventually induces the socially optimal outcome. '
volume: 120
URL: https://proceedings.mlr.press/v120/wu20a.html
PDF: http://proceedings.mlr.press/v120/wu20a/wu20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-wu20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Manxi
family: Wu
- given: Saurabh
family: Amin
- given: Asuman
family: Ozdaglar
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 561-570
id: wu20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 561
lastpage: 570
published: 2020-07-31 00:00:00 +0000
- title: 'Learning-based Stochastic Model Predictive Control with State-Dependent Uncertainty'
abstract: 'The increasing complexity of modern systems can introduce significant uncertainties to the models that describe them, which poses a great challenge to safe model-based control. This paper presents a learning-based stochastic model predictive control (LB-SMPC) strategy with chance constraints for offset-free trajectory tracking. The LB-SMPC strategy systematically handles plant-model mismatch between the actual system dynamics and a system model via a state-dependent uncertainty term that is intended to correct model predictions at each sampling time. A chance constraint handling method is presented to ensure state constraint satisfaction to a desired level for the case of state-dependent model uncertainty. Closed-loop simulations demonstrate the usefulness of LB- SMPC for predictive control of safety-critical systems with hard-to-model and/or time-varying dynamics.'
volume: 120
URL: https://proceedings.mlr.press/v120/bonzanini20a.html
PDF: http://proceedings.mlr.press/v120/bonzanini20a/bonzanini20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bonzanini20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Angelo Domenico
family: Bonzanini
- given: Ali
family: Mesbah
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 571-580
id: bonzanini20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 571
lastpage: 580
published: 2020-07-31 00:00:00 +0000
- title: 'Stable Reinforcement Learning with Unbounded State Space'
abstract: 'We consider the problem of reinforcement learning (RL) with unbounded state space motivated by the classical problem of scheduling in a queueing network. We argue that a reasonable RL policy for such settings must be based on online training, since any policy based only on finite samples cannot perform well in the entire unbounded state space. We introduce such an online RL policy using Sparse-Sampling-based Monte Carlo Oracle. To analyze this policy, we propose an appropriate notion of desirable performance in terms of stability: the state dynamics under the policy should remain in a bounded region with high probability. We show that if the system dynamics under optimal policy respects a Lyapunov function, then our policy is stable. Our policy does not need to know the Lyapunov function. Moreover, the assumption of existence Lyapunov function is not restrictive as this assumption is equivalent to the positive recurrence or stability property of any Markov chain, i.e., if there is any policy that can stabilize the system then it must posses a Lyapunov function.'
volume: 120
URL: https://proceedings.mlr.press/v120/shah20a.html
PDF: http://proceedings.mlr.press/v120/shah20a/shah20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-shah20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Devavrat
family: Shah
- given: Qiaomin
family: Xie
- given: Zhi
family: Xu
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 581-581
id: shah20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 581
lastpage: 581
published: 2020-07-31 00:00:00 +0000
- title: 'Periodic Q-Learning'
abstract: 'The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates – the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.'
volume: 120
URL: https://proceedings.mlr.press/v120/lee20a.html
PDF: http://proceedings.mlr.press/v120/lee20a/lee20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-lee20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Donghwan
family: Lee
- given: Niao
family: He
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 582-598
id: lee20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 582
lastpage: 598
published: 2020-07-31 00:00:00 +0000
- title: 'Robust Learning-Based Control via Bootstrapped Multiplicative Noise'
abstract: 'Despite decades of research and recent progress in adaptive control and reinforcement learning, there remains a fundamental lack of understanding in designing controllers that provide robustness to inherent non-asymptotic uncertainties arising from models estimated with finite, noisy data. We propose a robust adaptive control algorithm that explicitly incorporates such non-asymptotic uncertainties into the control design. The algorithm has three components: (1) a least-squares nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. We show through numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.'
volume: 120
URL: https://proceedings.mlr.press/v120/gravell20a.html
PDF: http://proceedings.mlr.press/v120/gravell20a/gravell20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-gravell20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Benjamin
family: Gravell
- given: Tyler
family: Summers
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 599-607
id: gravell20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 599
lastpage: 607
published: 2020-07-31 00:00:00 +0000
- title: 'Robust Regression for Safe Exploration in Control'
abstract: 'We study the problem of safe learning and exploration in sequential control problems. The goal is to safely collect data samples from operating in an environment, in order to learn to achieve a challenging control goal (e.g., an agile maneuver close to a boundary). A central challenge in this setting is how to quantify uncertainty in order to choose provably-safe actions that allow us to collect informative data and reduce uncertainty, thereby achieving both improved controller safety and optimality. To address this challenge, we present a deep robust regression model that is trained to directly predict the uncertainty bounds for safe exploration. We derive generalization bounds for learning and connect them with safety and stability bounds in control. We demonstrate empirically that our robust regression approach can outperform the conventional Gaussian process (GP) based safe exploration in settings where it is difficult to specify a good GP prior.'
volume: 120
URL: https://proceedings.mlr.press/v120/liu20a.html
PDF: http://proceedings.mlr.press/v120/liu20a/liu20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-liu20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Anqi
family: Liu
- given: Guanya
family: Shi
- given: Soon-Jo
family: Chung
- given: Anima
family: Anandkumar
- given: Yisong
family: Yue
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 608-619
id: liu20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 608
lastpage: 619
published: 2020-07-31 00:00:00 +0000
- title: 'Constrained Upper Confidence Reinforcement Learning'
abstract: 'Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.'
volume: 120
URL: https://proceedings.mlr.press/v120/zheng20a.html
PDF: http://proceedings.mlr.press/v120/zheng20a/zheng20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-zheng20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Liyuan
family: Zheng
- given: Lillian
family: Ratliff
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 620-629
id: zheng20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 620
lastpage: 629
published: 2020-07-31 00:00:00 +0000
- title: 'Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems'
abstract: 'Execution of complex tasks in robotics requires motions that have complex geometric structure. We present an approach which allows robots to learn such motions from a few human demonstrations. The motions are encoded as rollouts of a dynamical system on a Riemannian manifold. Additional structure is imposed which guarantees smooth convergent motions to a goal location. The aforementioned structure involves viewing motions on an observed Riemannian manifold as deformations of straight lines on a latent Euclidean space. The observed and latent spaces are related through a diffeomorphism. Thus, this paper presents an approach for learning flexible diffeomorphisms, resulting in a stable dynamical system. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.'
volume: 120
URL: https://proceedings.mlr.press/v120/rana20a.html
PDF: http://proceedings.mlr.press/v120/rana20a/rana20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-rana20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Muhammad Asif
family: Rana
- given: Anqi
family: Li
- given: Dieter
family: Fox
- given: Byron
family: Boots
- given: Fabio
family: Ramos
- given: Nathan
family: Ratliff
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 630-639
id: rana20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 630
lastpage: 639
published: 2020-07-31 00:00:00 +0000
- title: 'Planning from Images with Deep Latent Gaussian Process Dynamics'
abstract: 'Planning is a powerful approach to control problems with known environment dynamics. In unknown environments the agent needs to learn a model of the system dynamics to make planning applicable. This is particularly challenging when the underlying states are only indirectly observable through high-dimensional observations such as images. We propose to learn a deep latent Gaussian process dynamics (DLGPD) model that learns low-dimensional system dynamics from environment interactions with visual observations. The method infers latent state representations from observations using neural networks and models the system dynamics in the learned latent space with Gaussian processes. All parts of the model can be trained jointly by optimizing a lower bound on the likelihood of transitions in image space. We evaluate the proposed approach on the pendulum swing-up task while using the learned dynamics model for planning in latent space in order to solve the control problem. We also demonstrate that our method can quickly adapt a trained agent to changes in the system dynamics from just a few rollouts. We compare our approach to a state-of-the-art purely deep learning based method and demonstrate the advantages of combining Gaussian processes with deep learning for data efficiency and transfer learning.'
volume: 120
URL: https://proceedings.mlr.press/v120/bosch20a.html
PDF: http://proceedings.mlr.press/v120/bosch20a/bosch20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bosch20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Nathanael
family: Bosch
- given: Jan
family: Achterhold
- given: Laura
family: Leal-Taixé
- given: Jörg
family: Stückler
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 640-650
id: bosch20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 640
lastpage: 650
published: 2020-07-31 00:00:00 +0000
- title: 'A First Principles Approach for Data-Efficient System Identification of Spring-Rod Systems via Differentiable Physics Engines'
abstract: 'We propose a novel differentiable physics engine for system identification of complex spring-rod assemblies. Unlike black-box data-driven methods for learning the evolution of a dynamical system and its parameters, we modularize the design of our engine using a discrete form of the governing equations of motion, similar to a traditional physics engine. We further reduce the dimension from 3D to 1D for each module, which allows efficient learning of system parameters using linear regression. As a side benefit, the regression parameters correspond to physical quantities, such as spring stiffness or the mass of the rod, making the pipeline explainable. The approach significantly reduces the amount of training data required, and also avoids iterative identification of data sampling and model training. We compare the performance of the proposed engine with previous solutions, and demonstrate its efficacy on tensegrity systems, such as NASA’s icosahedron.'
volume: 120
URL: https://proceedings.mlr.press/v120/wang20b.html
PDF: http://proceedings.mlr.press/v120/wang20b/wang20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-wang20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Kun
family: Wang
- given: Mridul
family: Aanjaneya
- given: Kostas
family: Bekris
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 651-665
id: wang20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 651
lastpage: 665
published: 2020-07-31 00:00:00 +0000
- title: 'Model-Based Reinforcement Learning with Value-Targeted Regression'
abstract: 'Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. In this paper we focus on finite-horizon episodic RL where the transition model admits the linear parametrization: $P = \sum_{i=1}^{d} (\theta)_{i}P_{i}$. This parametrization provides a universal function approximation and capture several useful models and applications. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of $\theta$ by recursively solving a regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound $\tilde{\mathcal{O}}(d\sqrt{H^{3}T})$, where $H, T, d$ are the horizon, total number of steps and dimension of $\theta$. This regret bound is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.'
volume: 120
URL: https://proceedings.mlr.press/v120/jia20a.html
PDF: http://proceedings.mlr.press/v120/jia20a/jia20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-jia20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Zeyu
family: Jia
- given: Lin
family: Yang
- given: Csaba
family: Szepesvari
- given: Mengdi
family: Wang
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 666-686
id: jia20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 666
lastpage: 686
published: 2020-07-31 00:00:00 +0000
- title: 'Localized Learning of Robust Controllers for Networked Systems with Dynamic Topology'
abstract: 'Our previous work proposed an approach to localized adaptive and robust control over a large-scale network of systems subject to a single topological modification. In this paper, we develop this approach into an iterative scheme to handle multiple topological modifications over time, which switch between configurations in a finite-state Markov chain. Each system in the network uses its local information to robustly control its own state while also learning the current state of the network topology (i.e. which state of the Markov chain it is currently in). Additionally, each system maintains an estimate of certain parameters for the overall network, for instance, the transition probabilities of the Markov chain, and each system uses standard average consensus methods to update its estimate. We simulate a simple centered hexagon network with 7 systems and 4 different topological states, and show that each system in the network manages to stabilize under a control law that uses only local information, and adapts to the current topology within a reasonable amount of time after a switch is made.'
volume: 120
URL: https://proceedings.mlr.press/v120/han20a.html
PDF: http://proceedings.mlr.press/v120/han20a/han20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-han20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Soojean
family: Han
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 687-696
id: han20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 687
lastpage: 696
published: 2020-07-31 00:00:00 +0000
- title: 'NeuralExplorer: State Space Exploration of Closed Loop Control Systems Using Neural Networks'
abstract: 'In this paper, we propose a framework for performing state space exploration of closed loop control systems. For closed loop control systems, we introduce the notion of inverse sensitivity function and present a mechanism for approximating inverse sensitivity by a neural network. This neural network can be used for generating trajectories that reach a destination (or a neighborhood around it). We demonstrate the effectiveness of our approach by applying it to standard nonlinear dynamical systems, nonlinear hybrid systems, and also neural network based feedback control systems.'
volume: 120
URL: https://proceedings.mlr.press/v120/goyal20a.html
PDF: http://proceedings.mlr.press/v120/goyal20a/goyal20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-goyal20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Manish
family: Goyal
- given: Parasara Sridhar
family: Duggirala
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 697-697
id: goyal20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 697
lastpage: 697
published: 2020-07-31 00:00:00 +0000
- title: 'Toward fusion plasma scenario planning for NSTX-U using machine-learning-accelerated models'
abstract: 'One of the most promising devices for realizing power production through nuclear fusion is the tokamak. To maximize performance, it is preferable that tokamak reactors achieve advanced operating scenarios characterized by good plasma confinement, improved magnetohydrodynamic stability, and a largely non-inductively driven plasma current. Such scenarios could enable steady-state reactor operation with high fusion gain — the ratio of produced fusion power to the external power provided through the plasma boundary. Precise and robust control of the evolution of the plasma boundary shape as well as the spatial distribution of the plasma current, density, temperature, and rotation will be essential to achieving and maintaining such scenarios. The complexity of the evolution of tokamak plasmas, arising due to nonlinearities and coupling between various parameters, motivates the use of model-based control algorithms that can account for the system dynamics. In this work, a learning-based accelerated model trained on data from the National Spherical Torus Experiment Upgrade (NSTX-U) is employed to develop planning and control strategies for regulating the density and temperature profile evolution around desired trajectories. The proposed model combines empirical scaling laws developed across multiple devices with neural networks trained on empirical data from NSTX-U and a database of first-principles-based computationally intensive simulations. The reduced execution time of the accelerated model will enable practical application of optimization algorithms and reinforcement learning approaches for scenario planning and control development. An initial demonstration of applying optimization approaches to the learning-based model is presented, including a strategy for mitigating the effect of leaving the finite validity range of the accelerated model. The approach shows promise for actuator planning between experiments and in real-time.'
volume: 120
URL: https://proceedings.mlr.press/v120/boyer20a.html
PDF: http://proceedings.mlr.press/v120/boyer20a/boyer20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-boyer20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mark
family: Boyer
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 698-707
id: boyer20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 698
lastpage: 707
published: 2020-07-31 00:00:00 +0000
- title: 'Learning for Safety-Critical Control with Control Barrier Functions'
abstract: 'Modern nonlinear control theory seeks to endow systems with properties of stability and safety, and have been deployed successfully in multiple domains. Despite this success, model uncertainty remains a significant challenge in synthesizing safe controllers, leading to degradation in the properties provided by the controllers. This paper develops a machine learning framework utilizing Control Barrier Functions (CBFs) to reduce model uncertainty as it impact the safe behavior of a system. This approach iteratively collects data and updates a controller, ultimately achieving safe behavior. We validate this method in simulation and experimentally on a Segway platform.'
volume: 120
URL: https://proceedings.mlr.press/v120/taylor20a.html
PDF: http://proceedings.mlr.press/v120/taylor20a/taylor20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-taylor20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Andrew
family: Taylor
- given: Andrew
family: Singletary
- given: Yisong
family: Yue
- given: Aaron
family: Ames
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 708-717
id: taylor20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 708
lastpage: 717
published: 2020-07-31 00:00:00 +0000
- title: 'Learning Dynamical Systems with Side Information'
abstract: ' We present a mathematical formalism and a computational framework for the problem of learning a dynamical system from noisy observations of a few trajectories and subject to side information (e.g., physical laws or contextual knowledge). We identify six classes of side information which can be imposed by semidefinite programming and that arise naturally in many applications. We demonstrate their value on two examples from epidemiology and physics. Some density results on polynomial dynamical systems that either exactly or approximately satisfy side information are also presented. '
volume: 120
URL: https://proceedings.mlr.press/v120/ahmadi20a.html
PDF: http://proceedings.mlr.press/v120/ahmadi20a/ahmadi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-ahmadi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Amir Ali
family: Ahmadi
- given: Bachir El
family: Khadir
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 718-727
id: ahmadi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 718
lastpage: 727
published: 2020-07-31 00:00:00 +0000
- title: 'Feynman-Kac Neural Network Architectures for Stochastic Control Using Second-Order FBSDE Theory'
abstract: 'We present a deep recurrent neural network architecture to solve a class of stochastic optimal control problems described by fully nonlinear Hamilton Jacobi Bellman partial differential equations. Such PDEs arise when considering stochastic dynamics characterized by uncertainties that are additive, state dependent, and control multiplicative. Stochastic models with these characteristics are important in computational neuroscience, biology, finance, and aerospace systems and provide a more accurate representation of actuation than models with only additive uncertainty. Previous literature has established the inadequacy of the linear HJB theory for such problems, so instead, methods relying on the generalized version of the Feynman-Kac lemma have been proposed resulting in a system of second-order Forward-Backward SDEs. However, so far, these methods suffer from compounding errors resulting in lack of scalability. In this paper, we propose a deep learning based algorithm that leverages the second-order FBSDE representation and LSTM-based recurrent neural networks to not only solve such stochastic optimal control problems but also overcome the problems faced by traditional approaches, including scalability. The resulting control algorithm is tested on a high-dimensional linear system and three nonlinear systems from robotics and biomechanics in simulation to demonstrate feasibility and out-performance against previous methods.'
volume: 120
URL: https://proceedings.mlr.press/v120/pereira20a.html
PDF: http://proceedings.mlr.press/v120/pereira20a/pereira20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-pereira20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Marcus
family: Pereira
- given: Ziyi
family: Wang
- given: Tianrong
family: Chen
- given: Emily
family: Reed
- given: Evangelos
family: Theodorou
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 728-738
id: pereira20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 728
lastpage: 738
published: 2020-07-31 00:00:00 +0000
- title: 'Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time'
abstract: 'In this paper, we introduce Hamilton-Jacobi-Bellman (HJB) equations for Q-functions in continuous time optimal control problems with Lipschitz continuous controls. The standard Q-function used in reinforcement learning is shown to be the unique viscosity solution of the HJB equation. A necessary and sufficient condition for optimality is provided using the viscosity solution framework. By using the HJB equation, we develop a Q-learning method for continuous-time dynamical systems. A DQN-like algorithm is also proposed for high-dimensional state and control spaces. The performance of the proposed Q-learning algorithm is demonstrated using 1-, 10- and 20-dimensional dynamical systems.'
volume: 120
URL: https://proceedings.mlr.press/v120/kim20b.html
PDF: http://proceedings.mlr.press/v120/kim20b/kim20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-kim20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jeongho
family: Kim
- given: Insoon
family: Yang
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 739-748
id: kim20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 739
lastpage: 748
published: 2020-07-31 00:00:00 +0000
- title: 'Identifying Mechanical Models of Unknown Objects with Differentiable Physics Simulations'
abstract: 'This paper proposes a new method for manipulating unknown objects through a sequence of non-prehensile actions that displace an object from its initial configuration to a given goal configuration on a flat surface. The proposed method leverages recent progress in differentiable physics models to identify unknown mechanical properties of manipulated objects, such as inertia matrix, friction coefficients and external forces acting on the object. To this end, a recently proposed differentiable physics engine for two-dimensional objects is adopted in this work and extended to deal forces in the three-dimensional space. The proposed model identification technique analytically computes the gradient of the distance between forecasted poses of objects and their actual observed poses, and utilizes that gradient to search for values of the mechanical properties that reduce the reality gap.'
volume: 120
URL: https://proceedings.mlr.press/v120/song20a.html
PDF: http://proceedings.mlr.press/v120/song20a/song20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-song20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Changkyu
family: Song
- given: Abdeslam
family: Boularias
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 749-760
id: song20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 749
lastpage: 760
published: 2020-07-31 00:00:00 +0000
- title: 'Objective Mismatch in Model-based Reinforcement Learning'
abstract: 'Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework.In this paper, we identify a fundamental issue of the standard MBRL framework – what we call the objective mismatch issue. Objective mismatch arises when one objective is optimized in the hope that a second, often uncorrelated, metric will also be optimized. In the context of MBRL, we characterize the objective mismatch between training the forward dynamics model w.r.t. the likelihood of the one-step ahead prediction, and the overall goal of improving performance on a downstream control task. For example, this issue can emerge with the realization that dynamics models effective for a specific task do not necessarily need to be globally accurate, and vice versa globally accurate models might not be sufficiently accurate locally to obtain good control performance on a specific task. In our experiments, we study this objective mismatch issue and demonstrate that the likelihood of one-step ahead predictions is not always correlated with control performance. This observation highlights a critical limitation in the MBRL framework which will require further research to be fully understood and addressed. We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training. Building on it, we conclude with a discussion about other potential directions of research for addressing this issue.'
volume: 120
URL: https://proceedings.mlr.press/v120/lambert20a.html
PDF: http://proceedings.mlr.press/v120/lambert20a/lambert20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-lambert20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Nathan
family: Lambert
- given: Brandon
family: Amos
- given: Omry
family: Yadan
- given: Roberto
family: Calandra
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 761-770
id: lambert20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 761
lastpage: 770
published: 2020-07-31 00:00:00 +0000
- title: 'Tools for Data-driven Modeling of Within-Hand Manipulation with Underactuated Adaptive Hands'
abstract: 'Precise in-hand manipulation is an important skill for a robot to perform tasks in human environments. Practical robotic hands must be low-cost, easy to control and capable. 3D-printed underactuated adaptive hands provide such properties as they are cheap to fabricate and adapt to objects of uncertain geometry with stable grasps. Challenges still remain, however, before such hands can attain human-like performance due to complex dynamics and contacts. In particular, useful models for planning, control or model-based reinforcement learning are still lacking. Recently, data-driven approaches for such models have shown promise. This work provides the first large public dataset of real within-hand manipulation that facilitates building such models, along with baseline data-driven modeling results. Furthermore, it contributes ROS-based physics-engine model of such hands for independent data collection, experimentation and sim-to-reality transfer work. '
volume: 120
URL: https://proceedings.mlr.press/v120/sintov20a.html
PDF: http://proceedings.mlr.press/v120/sintov20a/sintov20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-sintov20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Avishai
family: Sintov
- given: Andrew
family: Kimmel
- given: Bowen
family: Wen
- given: Abdeslam
family: Boularias
- given: Kostas
family: Bekris
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 771-780
id: sintov20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 771
lastpage: 780
published: 2020-07-31 00:00:00 +0000
- title: 'Probabilistic Safety Constraints for Learned High Relative Degree System Dynamics'
abstract: 'This paper focuses on learning a model of system dynamics online while satisfying safety constraints. Our motivation is to avoid offline system identification or hand-specified dynamics models and allow a system to safely and autonomously estimate and adapt its own model during online operation. Given streaming observations of the system state, we use Bayesian learning to obtain a distribution over the system dynamics. In turn, the distribution is used to optimize the system behavior and ensure safety with high probability, by specifying a chance constraint over a control barrier function.'
volume: 120
URL: https://proceedings.mlr.press/v120/khojasteh20a.html
PDF: http://proceedings.mlr.press/v120/khojasteh20a/khojasteh20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-khojasteh20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mohammad Javad
family: Khojasteh
- given: Vikas
family: Dhiman
- given: Massimo
family: Franceschetti
- given: Nikolay
family: Atanasov
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 781-792
id: khojasteh20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 781
lastpage: 792
published: 2020-07-31 00:00:00 +0000
- title: 'Lyceum: An efficient and scalable ecosystem for robot learning'
abstract: 'We introduce Lyceum, a high-performance computational ecosystem for robot learning. Lyceum is built on top of the Julia programming language and the MuJoCo physics simulator, combining the ease-of-use of a high-level programming language with the performance of native C. In addition,Lyceum has a straightforward API to support parallel computation across multiple cores and machines. Overall, depending on the complexity of the environment,Lyceum is 5-30X faster compared to other popular abstractions like OpenAI’s Gym and DeepMind’s dm-control. This substantially reduces training time for various reinforcement learning algorithms; and is also fast enough to support real-time model predictive control through MuJoCo. The code, tutorials, and demonstration videos can be found at: www.lyceum.ml.'
volume: 120
URL: https://proceedings.mlr.press/v120/summers20a.html
PDF: http://proceedings.mlr.press/v120/summers20a/summers20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-summers20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Colin
family: Summers
- given: Kendall
family: Lowrey
- given: Aravind
family: Rajeswaran
- given: Siddhartha
family: Srinivasa
- given: Emanuel
family: Todorov
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 793-803
id: summers20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 793
lastpage: 803
published: 2020-07-31 00:00:00 +0000
- title: 'Encoding Physical Constraints in Differentiable Newton-Euler Algorithm'
abstract: 'The recursive Newton-Euler Algorithm (RNEA) is a popular technique in robotics for computing the dynamics of robots. The computed dynamics can then be used for torque control with inverse dynamics, or for forward dynamics computations. RNEA can be framed as a differentiable computational graph, enabling the dynamics parameters of the robot to be learned from data. However, the dynamics parameters learned in this manner can be physically implausible. In this work, we incorporate physical constraints in the learning by adding structure to the learned parameters. This results in a framework that can learn physically plausible dynamics, improving the training speed as well as generalization of the learned dynamics models. We evaluate our method on real-time inverse dynamics predictions of a 7 degree of freedom robot arm, both in simulation and on the real robot. Our experiments study a spectrum of structure added to learned dynamics, and compare their performance and generalization.'
volume: 120
URL: https://proceedings.mlr.press/v120/sutanto20a.html
PDF: http://proceedings.mlr.press/v120/sutanto20a/sutanto20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-sutanto20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Giovanni
family: Sutanto
- given: Austin
family: Wang
- given: Yixin
family: Lin
- given: Mustafa
family: Mukadam
- given: Gaurav
family: Sukhatme
- given: Akshara
family: Rai
- given: Franziska
family: Meier
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 804-813
id: sutanto20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 804
lastpage: 813
published: 2020-07-31 00:00:00 +0000
- title: 'Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach'
abstract: 'This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance’s inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system. '
volume: 120
URL: https://proceedings.mlr.press/v120/li20c.html
PDF: http://proceedings.mlr.press/v120/li20c/li20c.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-li20c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Yingying
family: Li
- given: Yujie
family: Tang
- given: Runyu
family: Zhang
- given: Na
family: Li
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 814-814
id: li20c
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 814
lastpage: 814
published: 2020-07-31 00:00:00 +0000
- title: 'Learning to Plan via Deep Optimistic Value Exploration'
abstract: 'Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of models and formulate an upper confidence bound (UCB) objective to recover optimistic estimates. Training the policy on ensemble rollouts with the learned value function as the terminal cost allows for projecting long-term interactions into a limited planning horizon, thus enabling deep optimistic exploration. We do not assume a priori knowledge of either the dynamics or reward function. We demonstrate that our approach can accommodate both dense and sparse reward signals, while improving sample complexity on a variety of benchmarking tasks.'
volume: 120
URL: https://proceedings.mlr.press/v120/seyde20a.html
PDF: http://proceedings.mlr.press/v120/seyde20a/seyde20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-seyde20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Tim
family: Seyde
- given: Wilko
family: Schwarting
- given: Sertac
family: Karaman
- given: Daniela
family: Rus
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 815-825
id: seyde20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 815
lastpage: 825
published: 2020-07-31 00:00:00 +0000
- title: 'L1-GP: L1 Adaptive Control with Bayesian Learning'
abstract: 'We present L1-GP, an architecture based on L1 adaptive control and Gaussian Process Regression (GPR) for safe simultaneous control and learning. On one hand, the L1 adaptive control provides stability and transient performance guarantees, which allows for GPR to efficiently and safely learn the uncertain dynamics. On the other hand, the learned dynamics can be conveniently incorporated into the L1 control architecture without sacrificing robustness and tracking performance. Subsequently, the learned dynamics can lead to less conservative designs for performance/robustness tradeoff. We illustrate the efficacy of the proposed architecture via numerical simulations.'
volume: 120
URL: https://proceedings.mlr.press/v120/gahlawat20a.html
PDF: http://proceedings.mlr.press/v120/gahlawat20a/gahlawat20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-gahlawat20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Aditya
family: Gahlawat
- given: Pan
family: Zhao
- given: Andrew
family: Patterson
- given: Naira
family: Hovakimyan
- given: Evangelos
family: Theodorou
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 826-837
id: gahlawat20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 826
lastpage: 837
published: 2020-07-31 00:00:00 +0000
- title: 'Data-Driven Distributed Predictive Control via Network Optimization'
abstract: 'We consider a networked linear system where system matrices are unknown to the individual agents but sampled data is available to them. We propose a data-driven method for designing a distributed linear-quadratic controller where agents learn a non-parametric system model from a single sample trajectory in which nodes can predict future trajectories using only data available to themselves and their neighbors. Based on this system representation, we propose a control scheme where a network optimization problem is solved in a receding horizon manner. We show that the proposed control scheme is stabilizing and validate our results through numerical experiments.'
volume: 120
URL: https://proceedings.mlr.press/v120/allibhoy20a.html
PDF: http://proceedings.mlr.press/v120/allibhoy20a/allibhoy20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-allibhoy20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Ahmed
family: Allibhoy
- given: Jorge
family: Cortes
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 838-839
id: allibhoy20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 838
lastpage: 839
published: 2020-07-31 00:00:00 +0000
- title: 'Information Theoretic Model Predictive Q-Learning'
abstract: 'Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real systems in a systematic manner.'
volume: 120
URL: https://proceedings.mlr.press/v120/bhardwaj20a.html
PDF: http://proceedings.mlr.press/v120/bhardwaj20a/bhardwaj20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bhardwaj20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Mohak
family: Bhardwaj
- given: Ankur
family: Handa
- given: Dieter
family: Fox
- given: Byron
family: Boots
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 840-850
id: bhardwaj20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 840
lastpage: 850
published: 2020-07-31 00:00:00 +0000
- title: 'Learning nonlinear dynamical systems from a single trajectory'
abstract: 'We introduce algorithms for learning nonlinear dynamical systems of theform $x_{t+1}=\sigma(\Theta{}x_t)+\varepsilon_t$, where $\Theta$ is a weightmatrix, $\sigma$ is a nonlinear monotonic link function, and$\varepsilon_t$ is a mean-zero noise process. When the link function is known, wegive an algorithm that recovers the weight matrix $\Theta$ from a single trajectorywith optimal sample complexity and linear running time. The algorithmsucceeds under weaker statistical assumptions than in previous work, and inparticular i) does not require a bound on the spectral norm of the weightmatrix $\Theta$ (rather, it depends on a generalization of thespectral radius) and ii) works when the link function is the ReLU. Our analysis has three keycomponents: i) We show how \emph{sequential Rademacher complexities} can beused to provide generalization guarantees for general dynamicalsystems, ii) we give a general recipe whereby global stability fornonlinear dynamical systems can be used to certify that the state-vector covariance is well-conditioned, and iii) using these tools, we extend well-known algorithms for efficiently learning generalized linear models to the dependent setting.'
volume: 120
URL: https://proceedings.mlr.press/v120/foster20a.html
PDF: http://proceedings.mlr.press/v120/foster20a/foster20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-foster20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Dylan
family: Foster
- given: Tuhin
family: Sarkar
- given: Alexander
family: Rakhlin
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 851-861
id: foster20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 851
lastpage: 861
published: 2020-07-31 00:00:00 +0000
- title: 'A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes'
abstract: 'In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret. The algorithm is motivated by the Bellman saddle point formulation. It learns the optimal state-action distribution, which encodes a randomized policy, by interacting with the environment along a single trajectory and making primal-dual updates. The key to the analysis is to establish a connection between the min-max duality gap of Bellman saddle point and the cumulative regret of the learning agent. We show that, for ergodic AMDPs with finite state space $\mathcal{S}$ and action space $\mathcal{A}$ and uniformly bounded mixing times, the algorithm’s $T$-time step regret is $$ R(T)=\tilde{\mathcal{O}}\left( \left(t_{mix}^*\right)^2 \tau^{\frac{3}{2}} \sqrt{(\tau^3 + |\mathcal{A}|) |\mathcal{S}| T} \right), $$ where $t_{mix}^*$ is the worst-case mixing time, $\tau$ is an ergodicity parameter, $T$ is the number of time steps and $\tilde{\mathcal{O}}$ hides polylog factors.'
volume: 120
URL: https://proceedings.mlr.press/v120/gong20a.html
PDF: http://proceedings.mlr.press/v120/gong20a/gong20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-gong20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Hao
family: Gong
- given: Mengdi
family: Wang
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 862-883
id: gong20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 862
lastpage: 883
published: 2020-07-31 00:00:00 +0000
- title: 'Robust Deep Learning as Optimal Control: Insights and Convergence Guarantees'
abstract: 'The fragility of deep neural networks to adversarially-chosen inputs has motivated the need to revisit deep learning algorithms. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. This mechanism can be formulated as a min-max optimization problem, where the adversary seeks to maximize the loss function using an iterative first-order algorithm while the learner attempts to minimize it. However, finding adversarial examples in this way causes excessive computational overhead during training. By interpreting the min-max problem as an optimal control problem, it has recently been shown that one can exploit the compositional structure of neural networks in the optimization problem to improve the training time significantly. In this paper, we provide the first convergence analysis of this adversarial training algorithm by combining techniques from robust optimal control and inexact oracle methods in optimization. Our analysis sheds light on how the hyperparameters of the algorithm affect the its stability and convergence. We support our insights with experiments on a robust classification problem.'
volume: 120
URL: https://proceedings.mlr.press/v120/seidman20a.html
PDF: http://proceedings.mlr.press/v120/seidman20a/seidman20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-seidman20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jacob H.
family: Seidman
- given: Mahyar
family: Fazlyab
- given: Victor M.
family: Preciado
- given: George J.
family: Pappas
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 884-893
id: seidman20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 884
lastpage: 893
published: 2020-07-31 00:00:00 +0000
- title: 'Dual Stochastic MPC for Systems with Parametric and Structural Uncertainty'
abstract: 'Designing controllers for systems affected by model uncertainty can prove to be a challenge, especially when seeking the optimal compromise between the conflicting goals of identification and control. This trade-off is explicitly taken into account in the dual control problem, for which the exact solution is provided by stochastic dynamic programming. Due to its computational intractability, we propose a sampling-based approximation for systems affected by both parametric and structural model uncertainty. The approach proposed in this paper separates the prediction horizon in a dual and an exploitation part. The dual part is formulated as a scenario tree that actively discriminates among a set of potential models while learning unknown parameters. In the exploitation part, achieved information is fixed for each scenario, and open-loop control sequences are computed for the remainder of the horizon. As a result, we solve one optimization problem over a collection of control sequences for the entire horizon, explicitly considering the knowledge gained in each scenario, leading to a dual model predictive control formulation.'
volume: 120
URL: https://proceedings.mlr.press/v120/arcari20a.html
PDF: http://proceedings.mlr.press/v120/arcari20a/arcari20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-arcari20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Elena
family: Arcari
- given: Lukas
family: Hewing
- given: Max
family: Schlichting
- given: Melanie
family: Zeilinger
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 894-903
id: arcari20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 894
lastpage: 903
published: 2020-07-31 00:00:00 +0000
- title: 'Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation'
abstract: 'Control of nonlinear systems with unknown dynamics is a major challenge on the road to fully autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies. Such approaches have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication has come with the cost of an overall reduction in our ability to interpret the resulting policies from a classical perspective, and the need for extremely over-parameterized controllers. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems, in order to break down complex representations into simpler components. We exploit the rich representational power of probabilistic graphical models and derive a new expectation-maximization (EM) algorithm for learning a generative model and automatically decomposing nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of probabilistic switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.'
volume: 120
URL: https://proceedings.mlr.press/v120/abdulsamad20a.html
PDF: http://proceedings.mlr.press/v120/abdulsamad20a/abdulsamad20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-abdulsamad20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Hany
family: Abdulsamad
- given: Jan
family: Peters
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 904-914
id: abdulsamad20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 904
lastpage: 914
published: 2020-07-31 00:00:00 +0000
- title: 'A Kernel Mean Embedding Approach to Reducing Conservativeness in Stochastic Programming and Control'
abstract: 'In this paper, we apply kernel mean embedding methods to sample-based stochastic optimization and control. Specifically, we use the reduced-set expansion method as a way to discard sampled scenarios. The effect of such constraint removal is improved optimality and decreased conservativeness. This is achieved by solving a distributional-distance-regularized optimization problem. We demonstrated this optimization formulation is well-motivated in theory, computationally tractable, and effective in numerical algorithms. '
volume: 120
URL: https://proceedings.mlr.press/v120/zhu20a.html
PDF: http://proceedings.mlr.press/v120/zhu20a/zhu20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-zhu20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Jia-Jie
family: Zhu
- given: Bernhard
family: Schoelkopf
- given: Moritz
family: Diehl
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 915-923
id: zhu20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 915
lastpage: 923
published: 2020-07-31 00:00:00 +0000
- title: 'Efficient Large-Scale Gaussian Process Bandits by Believing only Informative Actions'
abstract: 'Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence in tuning the hyper-parameters of machine learning algorithms and more broadly, in decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via the GP upper confidence bound (UCB). While numerous prior works use GPs in bandit settings, they do not apply to settings where the total number of iterations $T$ may be large-scale, as the complexity of computing the posterior parameters scales cubically with the number of past observations. To circumvent this computational burden, we propose a simple statistical test: only incorporate an action into the GP posterior when its conditional entropy exceeds an $\epsilon$ threshold. Doing so permits us to derive sublinear regret bounds of GP bandit algorithms up to factors depending on the compression parameter $\epsilon$ for both discrete and continuous action sets. Moreover, the complexity of the GP posterior remains provably finite. Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms on various hyper-parameter tuning tasks, suggesting the merits of managing the complexity of GPs in bandit settings.'
volume: 120
URL: https://proceedings.mlr.press/v120/bedi20a.html
PDF: http://proceedings.mlr.press/v120/bedi20a/bedi20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-bedi20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Amrit Singh
family: Bedi
- given: Dheeraj
family: Peddireddy
- given: Vaneet
family: Aggarwal
- given: Alec
family: Koppel
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 924-934
id: bedi20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 924
lastpage: 934
published: 2020-07-31 00:00:00 +0000
- title: 'Plan2Vec: Unsupervised Representation Learning by Latent Plans'
abstract: 'In this paper, we introducePlan2Vec, an model-based method to learn state representation fromsequences of off-policy observation data via planning. In contrast to prior methods, plan2vec doesnot require grounding via expert trajectories or actions, opening it up to many unsupervised learningscenarios. When applied to control, plan2vec learns a representation that amortizes the planningcost, enabling test time planning complexity that is linear in planning depth rather than exhaustiveover the entire state space. We demonstrate the effectiveness of Plan2Vec on one simulated andtwo real-world image datasets, showing that Plan2Vec can effectively acquire representations thatcarry long-range structure to accelerate planning. Additional results and videos can be found athttps://sites.google.com/view/plan2vec'
volume: 120
URL: https://proceedings.mlr.press/v120/yang20b.html
PDF: http://proceedings.mlr.press/v120/yang20b/yang20b.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-yang20b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Ge
family: Yang
- given: Amy
family: Zhang
- given: Ari
family: Morcos
- given: Joelle
family: Pineau
- given: Pieter
family: Abbeel
- given: Roberto
family: Calandra
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 935-946
id: yang20b
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 935
lastpage: 946
published: 2020-07-31 00:00:00 +0000
- title: 'Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case Study on Model-Free Control of Markovian Jump Systems'
abstract: 'Markovian jump linear systems (MJLS) are an important class of dynamical systems that arise in many control applications. In this paper, we introduce the problem of controlling unknown MJLS as a new reinforcement learning benchmark for Markov decision processes with mixed continuous/discrete state variables. Compared with the traditional linear quadratic regulator (LQR), our proposed problem leads to a special hybrid MDP (with mixed continuous and discrete variables) and poses significant new challenges due to the appearance of an underlying Markov jump parameter governing the mode of the system dynamics. Specifically, the state of a MJLS does not form a Markov chain and hence one cannot study the MJLS control problem as a MDP with solely continuous state variable. However, one can augment the state and the jump parameter to obtain a MDP with a mixed continuous-discrete state space. We discuss how control theory sheds light on the policy parameterization of such hybrid MDPs. Using a recently developed policy gradient results for MJLS, we show that we can use data-driven methods to solve the discounted cost version of the LQR problem. We modify the widely used natural policy gradient method to directly learn the optimal state feedback control policy for MJLS without identifying either the system dynamics or the transition probability of the switching parameter. We implement the (data-driven) natural policy gradient method on different MJLS examples. Our simulation results suggest that the natural gradient method can efficiently learn the optimal controller for MJLS with unknown dynamics. '
volume: 120
URL: https://proceedings.mlr.press/v120/jansch-porto20a.html
PDF: http://proceedings.mlr.press/v120/jansch-porto20a/jansch-porto20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-jansch-porto20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Joao Paulo
family: Jansch-Porto
- given: Bin
family: Hu
- given: Geir
family: Dullerud
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 947-957
id: jansch-porto20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 947
lastpage: 957
published: 2020-07-31 00:00:00 +0000
- title: 'Improving Robustness via Risk Averse Distributional Reinforcement Learning'
abstract: 'One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on recently discovered distributional RL framework. We incorporate CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies to achieve robustness against a range of system disturbances. We validate the robustness of risk-aware SDPG on multiple environments.'
volume: 120
URL: https://proceedings.mlr.press/v120/singh20a.html
PDF: http://proceedings.mlr.press/v120/singh20a/singh20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-singh20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Rahul
family: Singh
- given: Qinsheng
family: Zhang
- given: Yongxin
family: Chen
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 958-968
id: singh20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 958
lastpage: 968
published: 2020-07-31 00:00:00 +0000
- title: 'Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning'
abstract: 'To flexibly and efficiently reason about dynamics of temporal sequences, abstract representations that compactly represent the important information in the sequence are needed. One way of constructing such representations is by focusing on the important events in a sequence. In this paper, we propose a model that learns both to discover such key events (or keyframes) as well as to represent the sequence in terms of them. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates keyframes and their temporal placement and then inpaints the sequences between keyframes. We propose a fully differentiable formulation for efficiently learning the keyframe placement. We show that KeyIn finds informative keyframes in several datasets with diverse dynamics. When evaluated on a planning task, KeyIn outperforms other recent proposals for learning hierarchical representations.'
volume: 120
URL: https://proceedings.mlr.press/v120/pertsch20a.html
PDF: http://proceedings.mlr.press/v120/pertsch20a/pertsch20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-pertsch20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Karl
family: Pertsch
- given: Oleh
family: Rybkin
- given: Jingyun
family: Yang
- given: Shenghao
family: Zhou
- given: Konstantinos
family: Derpanis
- given: Kostas
family: Daniilidis
- given: Joseph
family: Lim
- given: Andrew
family: Jaegle
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 969-979
id: pertsch20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 969
lastpage: 979
published: 2020-07-31 00:00:00 +0000
- title: 'Safe non-smooth black-box optimization with application to policy search'
abstract: 'For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $\min f^0(x)$ subject to $f^i(x)\leq 0, i = 1,\ldots, m$, at the same time, guaranteeing constraint satisfaction while learning with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem.'
volume: 120
URL: https://proceedings.mlr.press/v120/usmanova20a.html
PDF: http://proceedings.mlr.press/v120/usmanova20a/usmanova20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-usmanova20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Ilnura
family: Usmanova
- given: Andreas
family: Krause
- given: Maryam
family: Kamgarpour
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 980-989
id: usmanova20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 980
lastpage: 989
published: 2020-07-31 00:00:00 +0000
- title: 'Improving Input-Output Linearizing Controllers for Bipedal Robots via Reinforcement Learning'
abstract: 'The need of precise dynamics models and not being able to account for input constraints are two of the main drawbacks of input-output linearizing controllers. Model uncertainty is common in almost every robotic application, and input saturation is present in every real world system. In this paper, we address both challenges for the specific case of bipedal robots’ control by the use of reinforcement learning techniques. We demonstrate the performance of the designed controller for different uncertain scenarios on the five-link planar robot RABBIT. The advantages of the designed controller are highlighted and a comparison with a known effective adaptive controller is presented.'
volume: 120
URL: https://proceedings.mlr.press/v120/castaneda20a.html
PDF: http://proceedings.mlr.press/v120/castaneda20a/castaneda20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-castaneda20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Fernando
family: Castañeda
- given: Mathias
family: Wulfman
- given: Ayush
family: Agrawal
- given: Tyler
family: Westenbroek
- given: Shankar
family: Sastry
- given: Claire
family: Tomlin
- given: Koushil
family: Sreenath
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 990-999
id: castaneda20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 990
lastpage: 999
published: 2020-07-31 00:00:00 +0000
- title: 'Uncertain multi-agent MILPs: A data-driven decentralized solution with probabilistic feasibility guarantees'
abstract: 'We consider uncertain multi-agent optimization problems that are formulated as Mixed Integer Linear Programs (MILPs) with an almost separable structure. Specifically, agents have their own cost function and constraints, and need to set their local decision vector subject to coupling constraints due to shared resources. The problem is affected by uncertainty that is only known from data. A scalable decentralized approach to tackle the combinatorial complexity of constraint-coupled multi-agent MILPs has been recently introduced in the literature. However, the presence of uncertainty has been addressed only in a distributed convex optimization framework, i.e., without integer decision variables. This work fills in this gap by proposing a data-driven decentralized scheme to determine a solution with probabilistic feasibility guarantees that depend on the size of the data-set.'
volume: 120
URL: https://proceedings.mlr.press/v120/falsone20a.html
PDF: http://proceedings.mlr.press/v120/falsone20a/falsone20a.pdf
edit: https://github.com/mlresearch//v120/edit/gh-pages/_posts/2020-07-31-falsone20a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 2nd Conference on Learning for Dynamics and Control'
publisher: 'PMLR'
author:
- given: Alessandro
family: Falsone
- given: Federico
family: Molinari
- given: Maria
family: Prandini
editor:
- given: Alexandre M.
family: Bayen
- given: Ali
family: Jadbabaie
- given: George
family: Pappas
- given: Pablo A.
family: Parrilo
- given: Benjamin
family: Recht
- given: Claire
family: Tomlin
- given: Melanie
family: Zeilinger
page: 1000-1009
id: falsone20a
issued:
date-parts:
- 2020
- 7
- 31
firstpage: 1000
lastpage: 1009
published: 2020-07-31 00:00:00 +0000