- title: 'Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning'
abstract: 'We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.'
volume: 242
URL: https://proceedings.mlr.press/v242/chen24a.html
PDF: https://proceedings.mlr.press/v242/chen24a/chen24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-chen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Paula
family: Chen
- given: Tingwei
family: Meng
- given: Zongren
family: Zou
- given: Jérôme
family: Darbon
- given: George Em
family: Karniadakis
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1-12
id: chen24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1
lastpage: 12
published: 2024-06-11 00:00:00 +0000
- title: 'Data-efficient, explainable and safe box manipulation: Illustrating the advantages of physical priors in model-predictive control'
abstract: 'Model-based RL/control have gained significant traction in robotics. Yet, these approaches often remain data-inefficient and lack the explainability of hand-engineered solutions. This makes them difficult to debug/integrate in safety-critical settings. However, in many systems, prior knowledge of environment kinematics/dynamics is available. Incorporating such priors can help address the aforementioned problems by reducing problem complexity and the need for exploration, while also facilitating the expression of the decisions taken by the agent in terms of physically meaningful entities. Our aim with this paper is to illustrate and support this point of view via a case-study. We model a payload manipulation problem based on a real robotic system, and show that leveraging prior knowledge about the dynamics of the environment in an MPC framework can lead to improvements in explainability, safety and data-efficiency, leading to satisfying generalization properties with less data.'
volume: 242
URL: https://proceedings.mlr.press/v242/salehi24a.html
PDF: https://proceedings.mlr.press/v242/salehi24a/salehi24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-salehi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Achkan
family: Salehi
- given: Stephane
family: Doncieux
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 13-24
id: salehi24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 13
lastpage: 24
published: 2024-06-11 00:00:00 +0000
- title: 'Gradient shaping for multi-constraint safe reinforcement learning'
abstract: 'Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. In this paper, our focus lies in addressing the complex challenges associated with solving multi-constraint (MC) safe RL problems. We approach the Safe RL problem from the perspective of Multi-Objective Optimization (MOO) and propose a unified framework designed for MC safe RL algorithms. This framework highlights the manipulation of gradients derived from constraints. Leveraging insights from this framework and recognizing the significance of redundant and conflicting constraint conditions, we introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction. Our extensive experimentation demonstrates the effectiveness of our proposed method in encouraging exploration and learning a policy that improves both safety and reward performance across various challenging MC safe RL tasks as well as good scalability to the constraint dimension.'
volume: 242
URL: https://proceedings.mlr.press/v242/yao24a.html
PDF: https://proceedings.mlr.press/v242/yao24a/yao24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-yao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yihang
family: Yao
- given: Zuxin
family: Liu
- given: Zhepeng
family: Cen
- given: Peide
family: Huang
- given: Tingnan
family: Zhang
- given: Wenhao
family: Yu
- given: Ding
family: Zhao
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 25-39
id: yao24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 25
lastpage: 39
published: 2024-06-11 00:00:00 +0000
- title: 'Continual learning of multi-modal dynamics with external memory'
abstract: 'We study the problem of fitting a model to a dynamical environment when new modes of behavior emerge sequentially. The learning model is aware when a new mode appears, but it cannot access the true modes of individual training sequences. The state-of-the-art continual learning approaches cannot handle this setup, because parameter transfer suffers from catastrophic interference and episodic memory design requires the knowledge of the ground-truth modes of sequences. We devise a novel continual learning method that overcomes both limitations by maintaining a descriptor of the mode of an encountered sequence in a neural episodic memory. We employ a Dirichlet Process prior on the attention weights of the memory to foster efficient storage of the mode descriptors. Our method performs continual learning by transferring knowledge across tasks by retrieving the descriptors of similar modes of past tasks to the mode of a current sequence and feeding this descriptor into its transition kernel as control input. We observe the continual learning performance of our method to compare favorably to the mainstream parameter transfer approach.'
volume: 242
URL: https://proceedings.mlr.press/v242/akgul24a.html
PDF: https://proceedings.mlr.press/v242/akgul24a/akgul24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-akgul24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Abdullah
family: Akgül
- given: Gozde
family: Unal
- given: Melih
family: Kandemir
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 40-51
id: akgul24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 40
lastpage: 51
published: 2024-06-11 00:00:00 +0000
- title: 'Learning to stabilize high-dimensional unknown systems using Lyapunov-guided exploration'
abstract: 'Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24a.html
PDF: https://proceedings.mlr.press/v242/zhang24a/zhang24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Songyuan
family: Zhang
- given: Chuchu
family: Fan
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 52-67
id: zhang24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 52
lastpage: 67
published: 2024-06-11 00:00:00 +0000
- title: 'An investigation of time reversal symmetry in reinforcement learning'
abstract: 'One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.'
volume: 242
URL: https://proceedings.mlr.press/v242/barkley24a.html
PDF: https://proceedings.mlr.press/v242/barkley24a/barkley24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-barkley24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Brett
family: Barkley
- given: Amy
family: Zhang
- given: David
family: Fridovich-Keil
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 68-79
id: barkley24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 68
lastpage: 79
published: 2024-06-11 00:00:00 +0000
- title: 'HSVI-based online minimax strategies for partially observable stochastic games with neural perception mechanisms'
abstract: 'We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $\varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $\varepsilon$-distance to the value of the game at the initial belief known to both agents.'
volume: 242
URL: https://proceedings.mlr.press/v242/yan24a.html
PDF: https://proceedings.mlr.press/v242/yan24a/yan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-yan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Rui
family: Yan
- given: Gabriel
family: Santos
- given: Gethin
family: Norman
- given: David
family: Parker
- given: Marta
family: Kwiatkowska
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 80-91
id: yan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 80
lastpage: 91
published: 2024-06-11 00:00:00 +0000
- title: 'Real-time safe control of neural network dynamic models with sound approximation'
abstract: 'Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings.'
volume: 242
URL: https://proceedings.mlr.press/v242/hu24a.html
PDF: https://proceedings.mlr.press/v242/hu24a/hu24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Hanjiang
family: Hu
- given: Jianglin
family: Lan
- given: Changliu
family: Liu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 92-103
id: hu24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 92
lastpage: 103
published: 2024-06-11 00:00:00 +0000
- title: 'Tracking object positions in reinforcement learning: A metric for keypoint detection'
abstract: 'Reinforcement learning (RL) for robot control typically requires a detailed representation of the environment state, including information about task-relevant objects not directly measurable. Keypoint detectors, such as spatial autoencoders (SAEs), are a common approach to extracting a low-dimensional representation from high-dimensional image data. SAEs aim at spatial features such as object positions, which are often useful representations in robotic RL. However, whether an SAE is actually able to track objects in the scene and thus yields a spatial state representation well suited for RL tasks has rarely been examined due to a lack of established metrics. In this paper, we propose to assess the performance of an SAE instance by measuring how well keypoints track ground truth objects in images. We present a computationally lightweight metric and use it to evaluate common baseline SAE architectures on image data from a simulated robot task. We find that common SAEs differ substantially in their spatial extraction capability. Furthermore, we validate that SAEs that perform well in our metric achieve superior performance when used in downstream RL. Thus, our metric is an effective and lightweight indicator of RL performance before executing expensive RL training. Building on these insights, we identify three key modifications of SAE architectures to improve tracking performance. We make our code available at https://anonymous.4open.science/r/sae-rl.'
volume: 242
URL: https://proceedings.mlr.press/v242/cramer24a.html
PDF: https://proceedings.mlr.press/v242/cramer24a/cramer24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-cramer24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Emma
family: Cramer
- given: Jonas
family: Reiher
- given: Sebastian
family: Trimpe
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 104-116
id: cramer24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 104
lastpage: 116
published: 2024-06-11 00:00:00 +0000
- title: 'Linearised data-driven LSTM-based control of multi-input HVAC systems'
abstract: 'The pursuit of sustainability has paved the way for building management systems (BMSs) that can steer buildings in an energy-efficient way. In this article, a deep learning approach is proposed to control multi-input HVAC systems in order to minimize both thermal discomfort and operational cost. More particularly, an LSTM-based encoder-decoder process model, trained on historical weather data and control sequences generated while the building was steered by a modern rule-based controller (RBC), is fed into an optimisation problem, to which a change of variables is applied to efficiently model the effect of interdependent control inputs. Both the nonlinear LSTM process model and the cost function of the optimisation problem are linearised to formulate the control problem as a mixed integer linear programming (MILP) problem, which ensures that the controller can operate in near real-time and with limited computational power. Moreover, to avoid resorting to model extrapolation and to improve the model’s predictive performance, the set of allowed control signal values is restricted using a quantile-based approach. In addition to the purely data-driven controller (DDC), a hybrid controller is designed to leverage the strengths of the RBC and the DDC. The performance of both controllers is benchmarked against the RBC’s performance using the BOPTEST simulation environment under various experiment settings, highlighting how the hyperparameters affect the controller’s performance. Compared to the RBC, we show that the proposed controllers realise substantial improvements in terms of both thermal comfort and operational cost while controlling a single zone or two zones simultaneously.'
volume: 242
URL: https://proceedings.mlr.press/v242/hinderyckx24a.html
PDF: https://proceedings.mlr.press/v242/hinderyckx24a/hinderyckx24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hinderyckx24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Andreas
family: Hinderyckx
- given: Florence
family: Guillaume
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 117-129
id: hinderyckx24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 117
lastpage: 129
published: 2024-06-11 00:00:00 +0000
- title: 'The behavioral toolbox'
abstract: 'The Behavioral Toolbox is a collection of Matlab functions for modeling, analysis, and design of dynamical systems using the behavioral approach to systems theory and control. It implements newly emerged direct data-driven methods as well as classical parametric representations of linear time-invariant systems. At the core of the toolbox is a nonparameteric representation of the finite-horizon behavior by an orthonormal basis. The current version has education and research goals and isn’t intended for handling “big data”. The paper presents five problems — checking systems equality, interconnection of systems, errors-in-variables least-squares smoothing, missing input estimation, and data-driven forecasting — and describes their solution by the methods in the toolbox.'
volume: 242
URL: https://proceedings.mlr.press/v242/markovsky24a.html
PDF: https://proceedings.mlr.press/v242/markovsky24a/markovsky24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-markovsky24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Ivan
family: Markovsky
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 130-141
id: markovsky24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 130
lastpage: 141
published: 2024-06-11 00:00:00 +0000
- title: 'Learning “look-ahead” nonlocal traffic dynamics in a ring road'
abstract: 'The macroscopic traffic flow model is widely used for traffic control and management. To incorporate drivers’ anticipative behaviors and to remove impractical speed discontinuity inherent in the classic Lighthill–Whitham–Richards (LWR) traffic model, nonlocal partial differential equation (PDE) models with “look-ahead” dynamics have been proposed, which assume that the speed is a function of weighted downstream traffic density. However, it lacks data validation on two important questions: whether there exist nonlocal dynamics, and how the length and weight of the “look-ahead” window affect the spatial temporal propagation of traffic densities. In this paper, we adopt traffic trajectory data from a ring-road experiment and design a physics-informed neural network to learn the fundamental diagram and look-ahead kernel that best fit the data, and reinvent a data-enhanced nonlocal LWR model via minimizing the loss function combining the data discrepancy and the nonlocal model discrepancy. Results show that the learned nonlocal LWR yields a more accurate prediction of traffic wave propagation in three different scenarios: stop-and-go oscillations, congested, and free traffic. We first demonstrate the existence of “look-ahead” effect with real traffic data. The optimal nonlocal kernel is found out to take a length of around 35 to 50 meters, and the kernel weight within 5 meters accounts for the majority of the nonlocal effect. Our results also underscore the importance of choosing a priori physics in machine learning models.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhao24a.html
PDF: https://proceedings.mlr.press/v242/zhao24a/zhao24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Chenguang
family: Zhao
- given: Huan
family: Yu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 142-154
id: zhao24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 142
lastpage: 154
published: 2024-06-11 00:00:00 +0000
- title: 'Safe dynamic pricing for nonstationary network resource allocation'
abstract: 'This paper introduces the Safe Pricing for NUM with Gradual Variations (SPNUM-GV) algorithm, addressing challenges in pricing-based distributed resource allocation for safety-critical systems with non-stationary utility functions. Focusing on domains where 1) users’ optimal demand can only be induced through posted prices, 2) real-time two-way communication with the users is not available, 3) the induced demand must always belong to an arbitrarily shaped convex and compact feasible set in spite of price response uncertainty, and 4) the users’ response to prices are evolving over time, we design SPNUM-GV to generate prices that ensure stage-wise safety of the induced demand while achieving sublinear regret. SPNUM-GV ensures safety by determining a “desired demand” within a shrunk feasible set using a projected gradient method and updating the prices to induce a demand close to the desired demand by leveraging an estimate of the users’ price response function. By tuning the amount of shrinkage to account for the error between the desired and the induced demand, we prove that the induced demand always belongs to the feasible set. In addition, we prove that the regret incurred by the induced demand is $O(\sqrt{T(1+V_T)})$ after $T$ iterations, where $V_T$ is an upper bound on the total gradual variations of the users’ utility functions. Numerical simulations demonstrate the efficacy of SPNUM-GV and support our theoretical findings.'
volume: 242
URL: https://proceedings.mlr.press/v242/turan24a.html
PDF: https://proceedings.mlr.press/v242/turan24a/turan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-turan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Berkay
family: Turan
- given: Spencer
family: Hutchinson
- given: Mahnoosh
family: Alizadeh
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 155-167
id: turan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 155
lastpage: 167
published: 2024-06-11 00:00:00 +0000
- title: 'Safe online convex optimization with multi-point feedback'
abstract: 'Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $O(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.'
volume: 242
URL: https://proceedings.mlr.press/v242/hutchinson24a.html
PDF: https://proceedings.mlr.press/v242/hutchinson24a/hutchinson24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hutchinson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Spencer
family: Hutchinson
- given: Mahnoosh
family: Alizadeh
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 168-180
id: hutchinson24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 168
lastpage: 180
published: 2024-06-11 00:00:00 +0000
- title: 'Controlgym: Large-scale control environments for benchmarking reinforcement learning algorithms'
abstract: 'We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-dimensional partial differential equation (PDE)-based control problems. Integrated within the OpenAI Gym/Gymnasium (Gym) framework, controlgym allows direct applications of standard reinforcement learning (RL) algorithms like stable-baselines3. Our control environments complement those in Gym with continuous, unbounded action and observation spaces, motivated by real-world control applications. Moreover, the PDE control environments uniquely allow the users to extend the state dimensionality of the system to infinity while preserving the intrinsic dynamics. This feature is crucial for evaluating the scalability of RL algorithms for control. This project serves the learning for dynamics & control (L4DC) community, aiming to explore key questions: the convergence of RL algorithms in learning control policies; the stability and robustness issues of learning-based controllers; and the scalability of RL algorithms to high- and potentially infinite-dimensional systems. We open-source the controlgym project at https://github.com/xiangyuan-zhang/controlgym.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24b.html
PDF: https://proceedings.mlr.press/v242/zhang24b/zhang24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Xiangyuan
family: Zhang
- given: Weichao
family: Mao
- given: Saviz
family: Mowlavi
- given: Mouhacine
family: Benosman
- given: Tamer
family: Başar
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 181-196
id: zhang24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 181
lastpage: 196
published: 2024-06-11 00:00:00 +0000
- title: 'On the convergence of adaptive first order methods: Proximal gradient and alternating minimization algorithms'
abstract: 'Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes AdaPG, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity but also expands its applicability beyond standard strongly convex settings.'
volume: 242
URL: https://proceedings.mlr.press/v242/latafat24a.html
PDF: https://proceedings.mlr.press/v242/latafat24a/latafat24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-latafat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Puya
family: Latafat
- given: Andreas
family: Themelis
- given: Panagiotis
family: Patrinos
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 197-208
id: latafat24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 197
lastpage: 208
published: 2024-06-11 00:00:00 +0000
- title: 'Strengthened stability analysis of discrete-time Lurie systems involving ReLU neural networks'
abstract: 'This paper addresses the stability analysis of a discrete-time (DT) Lurie system featuring a static repeated ReLU nonlinearity. Such systems often arise in the analysis of recurrent neural networks and other neural feedback loops. Custom quadratic constraints, satisfied by the repeated ReLU, are employed to strengthen the standard Circle and Popov Criteria for this specific Lurie system. The criteria can be expressed as a set of linear matrix inequalities (LMIs) with less restrictive conditions on the matrix variables. It is further shown that if the Lurie system under consideration has a unique equilibrium point at the origin, then this equilibrium point is in fact globally stable or unstable, meaning that local stability analysis will provide no additional benefit. Numerical examples demonstrate that the strengthened criteria achieve a desirable balance between reduced conservatism and complexity when compared to existing criteria.'
volume: 242
URL: https://proceedings.mlr.press/v242/richardson24a.html
PDF: https://proceedings.mlr.press/v242/richardson24a/richardson24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-richardson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Carl
family: Richardson
- given: Matthew
family: Turner
- given: Steve
family: Gunn
- given: Ross
family: Drummond
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 209-221
id: richardson24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 209
lastpage: 221
published: 2024-06-11 00:00:00 +0000
- title: 'Interpretable data-driven model predictive control of building energy systems using SHAP'
abstract: 'Advanced building energy system controls, such as model predictive control, rely on accurate system models. To reduce the modelling effort in the building sector, data-driven models are becoming increasingly popular in research. Despite their promising performance, data-driven models are considered black boxes. This black box nature is an obstacle to widespread application, as it is difficult for building operators to understand how predictions are made. Concepts known as Explainable Artificial Intelligence are being developed to improve the interpretability of black box models. This work combines the popular Explainable Artificial Intelligence method Shapley Additive Explanations (SHAP) with data-driven model predictive control to increase the interpretability of artificial neural networks used as process models during model creation. Using a standardised residual building energy system for controller testing, an in-depth analysis of how the models make predictions is carried out. In addition, the influence of different model setups on the control performance is evaluated. The results show that the different control performances can be justified by analysing the underlying models with SHAP. SHAP shows how the characteristics of a feature affect the prediction and reveals weaknesses in the model. In addition, the features can be sorted according to their influence on the prediction, which is utilized for feature selection.'
volume: 242
URL: https://proceedings.mlr.press/v242/henkel24a.html
PDF: https://proceedings.mlr.press/v242/henkel24a/henkel24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-henkel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Patrick
family: Henkel
- given: Tobias
family: Kasperski
- given: Phillip
family: Stoffel
- given: Dirk
family: Müller
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 222-234
id: henkel24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 222
lastpage: 234
published: 2024-06-11 00:00:00 +0000
- title: 'Physics-informed Neural Networks with Unknown Measurement Noise'
abstract: 'Physics-informed neural networks (PINNs) constitute a flexible approach to both finding solutions and identifying parameters of partial differential equations. Most works on the topic assume noiseless data, or data contaminated with weak Gaussian noise. We show that the standard PINN framework breaks down in case of non-Gaussian noise. We give a way of resolving this fundamental issue and we propose to jointly train an energy-based model (EBM) to learn the correct noise distribution. We illustrate the improved performance of our approach using multiple examples.'
volume: 242
URL: https://proceedings.mlr.press/v242/pilar24a.html
PDF: https://proceedings.mlr.press/v242/pilar24a/pilar24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-pilar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Philipp
family: Pilar
- given: Niklas
family: Wahlström
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 235-247
id: pilar24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 235
lastpage: 247
published: 2024-06-11 00:00:00 +0000
- title: 'Adaptive online non-stochastic control'
abstract: 'We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.'
volume: 242
URL: https://proceedings.mlr.press/v242/mhaisen24a.html
PDF: https://proceedings.mlr.press/v242/mhaisen24a/mhaisen24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-mhaisen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Naram
family: Mhaisen
- given: George
family: Iosifidis
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 248-259
id: mhaisen24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 248
lastpage: 259
published: 2024-06-11 00:00:00 +0000
- title: 'Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems'
abstract: 'We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.'
volume: 242
URL: https://proceedings.mlr.press/v242/hoppe24a.html
PDF: https://proceedings.mlr.press/v242/hoppe24a/hoppe24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hoppe24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Heiko
family: Hoppe
- given: Tobias
family: Enders
- given: Quentin
family: Cappart
- given: Maximilian
family: Schiffer
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 260-272
id: hoppe24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 260
lastpage: 272
published: 2024-06-11 00:00:00 +0000
- title: 'Soft convex quantization: revisiting Vector Quantization with convex optimization'
abstract: 'Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces practical challenges including codebook collapse, non-differentiability and lossy compression. To mitigate the aforementioned issues, we propose Soft Convex Quantization (SCQ) as a direct substitute for VQ. SCQ works like a differentiable convex optimization (DCO) layer: in the forward pass, we solve for the optimal convex combination of codebook vectors to quantize the inputs. In the backward pass, we leverage differentiability through the optimality conditions of the forward solution. We then introduce a scalable relaxation of the SCQ optimization and demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets. We train powerful SCQ autoencoder models that significantly outperform matched VQ architectures, observing an order of magnitude better image reconstruction and codebook usage with comparable quantization runtime.'
volume: 242
URL: https://proceedings.mlr.press/v242/gautam24a.html
PDF: https://proceedings.mlr.press/v242/gautam24a/gautam24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gautam24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tanmay
family: Gautam
- given: Reid
family: Pryzant
- given: Ziyi
family: Yang
- given: Chenguang
family: Zhu
- given: Somayeh
family: Sojoudi
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 273-285
id: gautam24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 273
lastpage: 285
published: 2024-06-11 00:00:00 +0000
- title: 'Uncertainty quantification of set-membership estimation in control and perception: Revisiting the minimum enclosing ellipsoid'
abstract: 'Set-membership estimation (SME) outputs a set estimator that guarantees to cover the groundtruth. Such sets are, however, defined by (many) abstract (and potentially nonconvex) constraints and therefore difficult to manipulate. We present tractable algorithms to compute simple and tight overapproximations of SME in the form of minimum enclosing ellipsoids (MEE). We first introduce the hierarchy of enclosing ellipsoids proposed by Nie and Demmel (2005), based on sums-of-squares relaxations, that asymptotically converge to the MEE of a basic semialgebraic set. This framework, however, struggles in modern control and perception problems due to computational challenges. We contribute three computational enhancements to make this framework practical, namely constraints pruning, generalized relaxed Chebyshev center, and handling non-Euclidean geometry. We showcase numerical examples on system identification and object pose estimation.'
volume: 242
URL: https://proceedings.mlr.press/v242/tang24a.html
PDF: https://proceedings.mlr.press/v242/tang24a/tang24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-tang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yukai
family: Tang
- given: Jean-Bernard
family: Lasserre
- given: Heng
family: Yang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 286-298
id: tang24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 286
lastpage: 298
published: 2024-06-11 00:00:00 +0000
- title: 'Minimax dual control with finite-dimensional information state'
abstract: 'This article considers output-feedback control of systems where the function mapping states to measurements has a set-valued inverse. We show that if the set has a bounded number of elements, then minimax dual control of such systems admits finite-dimensional information states. We specialize our results to a discrete-time integrator with magnitude measurements and derive a surprisingly simple sub-optimal control policy that ensures finite gain of the closed loop. The sub-optimal policy is a proportional controller where the magnitude of the gain is computed offline, but the sign is learned, forgotten, and relearned online. The discrete-time integrator with magnitude measurements captures real-world applications such as antenna alignment, and despite its simplicity, it defies established control-design methods. For example, whether a stabilizing linear time-invariant controller exists for this system is unknown, and we conjecture that none exists.'
volume: 242
URL: https://proceedings.mlr.press/v242/kjellqvist24a.html
PDF: https://proceedings.mlr.press/v242/kjellqvist24a/kjellqvist24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kjellqvist24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Olle
family: Kjellqvist
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 299-311
id: kjellqvist24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 299
lastpage: 311
published: 2024-06-11 00:00:00 +0000
- title: 'An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems'
abstract: 'In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.'
volume: 242
URL: https://proceedings.mlr.press/v242/alsalti24a.html
PDF: https://proceedings.mlr.press/v242/alsalti24a/alsalti24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-alsalti24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Mohammad
family: Alsalti
- given: Victor G.
family: Lopez
- given: Matthias A.
family: Müller
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 312-323
id: alsalti24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 312
lastpage: 323
published: 2024-06-11 00:00:00 +0000
- title: 'Adapting image-based RL policies via predicted rewards'
abstract: 'Image-based reinforcement learning (RL) faces significant challenges in generalization when the visual environment undergoes substantial changes between training and deployment. Under such circumstances, learned policies may not perform well leading to degraded results. Previous approaches to this problem have largely focused on broadening the training observation distribution, employing techniques like data augmentation and domain randomization. However, given the sequential nature of the RL decision-making problem, it is often the case that residual errors are propagated by the learned policy model and accumulate throughout the trajectory, resulting in highly degraded performance. In this paper, we leverage the observation that predicted rewards under domain shift, even though imperfect, can still be a useful signal to guide fine-tuning. We exploit this property to fine-tune a policy using reward prediction in the target domain. We have found that, even under significant domain shift, the predicted reward can still provide meaningful signal and fine-tuning substantially improves the original policy. Our approach, termed Predicted Reward Fine-tuning (PRFT), improves performance across diverse tasks in both simulated benchmarks and real-world experiments. More information is available at project web page: https://sites.google.com/view/prft.'
volume: 242
URL: https://proceedings.mlr.press/v242/wang24a.html
PDF: https://proceedings.mlr.press/v242/wang24a/wang24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-wang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Weiyao
family: Wang
- given: Xinyuan
family: Fang
- given: Gregory
family: Hager
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 324-336
id: wang24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 324
lastpage: 336
published: 2024-06-11 00:00:00 +0000
- title: 'Piecewise regression via mixed-integer programming for MPC'
abstract: 'Piecewise regression is a versatile approach used in various disciplines to approximate complex functions from limited, potentially noisy data points. In control, piecewise regression is, e.g., used to approximate the optimal control law of model predictive control (MPC), the optimal value function, or unknown system dynamics. Neural networks are a common choice to solve the piecewise regression problem. However, due to their nonlinear structure, training is often based on gradient-based methods, which may fail to find a global optimum or even a solution that leads to a small approximation error. To overcome this problem and to find a global optimal solution, methods based on mixed-integer programming (MIP) can be used. However, the known MIP-based methods are either limited to a special class of functions, e.g., convex piecewise affine functions, or they lead to complex approximations in terms of the number of regions of the piecewise defined function. Both complicate a usage in the framework of control. We propose a new MIP-based method that is not restricted to a particular class of piecewise defined functions and leads to functions that are fast to evaluate and can be used within an optimization problem, making them well suited for use in control.'
volume: 242
URL: https://proceedings.mlr.press/v242/teichrib24a.html
PDF: https://proceedings.mlr.press/v242/teichrib24a/teichrib24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-teichrib24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Dieter
family: Teichrib
- given: Moritz Schulze
family: Darup
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 337-348
id: teichrib24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 337
lastpage: 348
published: 2024-06-11 00:00:00 +0000
- title: 'Parameter-adaptive approximate MPC: Tuning neural-network controllers without retraining'
abstract: 'Model Predictive Control (MPC) is a method to control nonlinear systems with guaranteed stability and constraint satisfaction but suffers from high computation times. Approximate MPC (AMPC) with neural networks (NNs) has emerged to address this limitation, enabling deployment on resource-constrained embedded systems. However, when tuning AMPCs for real-world systems, large datasets need to be regenerated and the NN needs to be retrained at every tuning step. This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. By incorporating local sensitivities of nonlinear programs, the proposed method not only mimics optimal MPC inputs but also adjusts to known changes in physical parameters of the model using linear predictions while still guaranteeing stability. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU). We use the same NN across both system instances that have different parameters. This work not only represents the first experimental demonstration of AMPC for fast-moving systems on low-cost MCUs to the best of our knowledge, but also showcases generalization across system instances and variations through our parameter-adaptation method. Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.'
volume: 242
URL: https://proceedings.mlr.press/v242/hose24a.html
PDF: https://proceedings.mlr.press/v242/hose24a/hose24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hose24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Henrik
family: Hose
- given: Alexander
family: Gräfe
- given: Sebastian
family: Trimpe
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 349-360
id: hose24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 349
lastpage: 360
published: 2024-06-11 00:00:00 +0000
- title: '$\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games'
abstract: 'No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.'
volume: 242
URL: https://proceedings.mlr.press/v242/mao24a.html
PDF: https://proceedings.mlr.press/v242/mao24a/mao24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-mao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Weichao
family: Mao
- given: Haoran
family: Qiu
- given: Chen
family: Wang
- given: Hubertus
family: Franke
- given: Zbigniew
family: Kalbarczyk
- given: Tamer
family: Başar
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 361-374
id: mao24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 361
lastpage: 374
published: 2024-06-11 00:00:00 +0000
- title: 'Inverse optimal control as an errors-in-variables problem'
abstract: 'Inverse optimal control (IOC) is about estimating an unknown objective of interest given its optimal control sequence. However, truly optimal demonstrations are often difficult to obtain, e.g., due to human errors or inaccurate measurements. This paper presents an IOC framework for objective estimation from multiple sub-optimal demonstrations in constrained environments. It builds upon the Karush-Kuhn-Tucker optimality conditions, and addresses the Errors-In-Variables problem that emerges from the use of sub-optimal data. The approach presented is applied to various systems in simulation, and consistency guarantees are provided for linear systems with zero mean additive noise, polytopic constraints, and objectives with quadratic features.'
volume: 242
URL: https://proceedings.mlr.press/v242/rickenbach24a.html
PDF: https://proceedings.mlr.press/v242/rickenbach24a/rickenbach24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-rickenbach24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Rahel
family: Rickenbach
- given: Anna
family: Scampicchio
- given: Melanie N.
family: Zeilinger
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 375-386
id: rickenbach24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 375
lastpage: 386
published: 2024-06-11 00:00:00 +0000
- title: 'Learning soft constrained MPC value functions: Efficient MPC design and implementation providing stability and safety guarantees'
abstract: 'Model Predictive Control (MPC) can be applied to safety-critical control problems, providing closed-loop safety and performance guarantees. Application of MPC requires solving an optimization problem at every sampling instant, making it challenging to implement on embedded hardware. To address this challenge, we propose a framework that combines a tightened soft constrained MPC formulation with a supervised learning framework to approximate the MPC value function. This combination enables us to obtain a corresponding optimal control law, which can be implemented efficiently on embedded platforms. The proposed framework ensures stability and constraint satisfaction for various nonlinear systems. While the design effort is similar to the design of nominal MPC formulations, we can establish input-to-state stability (ISS) with respect to the approximation error of the value function. Moreover, we prove that, while the optimal control law may be discontinuous, the value function corresponding to the soft constrained MPC problem is Lipschitz continuous for Lipschitz continuous systems. This serves two purposes: First, it allows to relate approximation errors to a sufficiently large constraint tightening to obtain constraint satisfaction guarantees. Secondly, it enables a very efficient supervised learning procedure for obtaining the approximation using continuous function approximator classes. We showcase the effectiveness of the method through a nonlinear numerical example.'
volume: 242
URL: https://proceedings.mlr.press/v242/chatzikiriakos24a.html
PDF: https://proceedings.mlr.press/v242/chatzikiriakos24a/chatzikiriakos24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-chatzikiriakos24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Nicolas
family: Chatzikiriakos
- given: Kim Peter
family: Wabersich
- given: Felix
family: Berkel
- given: Patricia
family: Pauli
- given: Andrea
family: Iannelli
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 387-398
id: chatzikiriakos24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 387
lastpage: 398
published: 2024-06-11 00:00:00 +0000
- title: 'MPC-inspired reinforcement learning for verifiable model-free control'
abstract: 'In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). These controllers adopt an unrolled Quadratic Programming (QP) solver, structured similarly to a deep neural network, with parameters from the QP problem that is similar to linear MPC. The parameters are learned rather than derived from models. This approach addresses the limitations of commonly learned controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture in deep reinforcement learning, in terms of explainability and performance guarantees. The learned controllers not only possess verifiable properties like persistent feasibility and asymptotic stability akin to MPC, but they also empirically match MPC and MLP controllers in control performance. Moreover, they are more computationally efficient in implementation compared to MPC and require significantly fewer learnable policy parameters than MLP controllers. Practical application is demonstrated through a vehicle drift maneuvering task, showcasing the potential of these controllers in real-world scenarios.'
volume: 242
URL: https://proceedings.mlr.press/v242/lu24a.html
PDF: https://proceedings.mlr.press/v242/lu24a/lu24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yiwen
family: Lu
- given: Zishuo
family: Li
- given: Yihan
family: Zhou
- given: Na
family: Li
- given: Yilin
family: Mo
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 399-413
id: lu24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 399
lastpage: 413
published: 2024-06-11 00:00:00 +0000
- title: 'Real-world fluid directed rigid body control via deep reinforcement learning'
abstract: 'Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work, we introduce “Box o’ Flows”, a novel benchtop experimental control system for systematically evaluating RL algorithms in dynamic real-world scenarios. We describe the key components of the Box o’ Flows, and through a series of experiments demonstrate how state-of-the-art model-free RL algorithms can synthesize a variety of complex behaviors via simple reward specifications. Furthermore, we explore the role of offline RL in data-efficient hypothesis testing by reusing past experiences. We believe that the insights gained from this preliminary study and the availability of systems like the Box o’ Flows support the way forward for developing systematic RL algorithms that can be generally applied to complex, dynamical systems. Supplementary material and videos of experiments are available at https://sites.google.com/view/box-o-flows/home.'
volume: 242
URL: https://proceedings.mlr.press/v242/bhardwaj24a.html
PDF: https://proceedings.mlr.press/v242/bhardwaj24a/bhardwaj24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bhardwaj24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Mohak
family: Bhardwaj
- given: Thomas
family: Lampe
- given: Michael
family: Neunert
- given: Francesco
family: Romano
- given: Abbas
family: Abdolmaleki
- given: Arunkumar
family: Byravan
- given: Markus
family: Wulfmeier
- given: Martin
family: Riedmiller
- given: Jonas
family: Buchli
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 414-427
id: bhardwaj24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 414
lastpage: 427
published: 2024-06-11 00:00:00 +0000
- title: 'On the uniqueness of solution for the Bellman equation of LTL objectives'
abstract: 'Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be $0$. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition.'
volume: 242
URL: https://proceedings.mlr.press/v242/xuan24a.html
PDF: https://proceedings.mlr.press/v242/xuan24a/xuan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-xuan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zetong
family: Xuan
- given: Alper
family: Bozkurt
- given: Miroslav
family: Pajic
- given: Yu
family: Wang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 428-439
id: xuan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 428
lastpage: 439
published: 2024-06-11 00:00:00 +0000
- title: 'Decision boundary learning for safe vision-based navigation via Hamilton-Jacobi reachability analysis and support vector machine'
abstract: 'We develop a self-supervised learning method that can predict safe and unsafe high-level waypoints for robot navigation in the form of a decision boundary given solely a RGB image without knowledge of a prior map. To provide the theoretical basis for such prediction, we use a Hamilton-Jacobi reachability analysis, a formal verification method, as the oracle for labeling training datasets. Given the labeled data, our neural network learn the coefficients of a decision boundary via a soft-margin Support Vector Machine loss function to classify safe and unsafe system states. We experimentally show that our method is generalizable and generates safety decision boundaries in unseen indoor environments. Our method advantages are its explainability and accurate safety prediction, which is important for safety-critical systems. Finally, we demonstrate our method via experiments where we showcase the learning-based safe decision boundary estimation that employs monocular RGB images, and current linear speed.'
volume: 242
URL: https://proceedings.mlr.press/v242/toufighi24a.html
PDF: https://proceedings.mlr.press/v242/toufighi24a/toufighi24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-toufighi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tara
family: Toufighi
- given: Minh
family: Bui
- given: Rakesh
family: Shrestha
- given: Mo
family: Chen
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 440-452
id: toufighi24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 440
lastpage: 452
published: 2024-06-11 00:00:00 +0000
- title: 'Understanding the difficulty of solving Cauchy problems with PINNs'
abstract: 'Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we aim to understand this issue from two perspectives in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural networks. We show that minimizing the sum of $L^2$ residual and initial condition error is not sufficient to guarantee the true solution, as this loss function does not capture the underlying dynamics. Additionally, neural networks are not capable of capturing singularities in the solutions due to the non-compactness of their image sets. This, in turn, influences the existence of global minima and the regularity of the network. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of achievable error in practice. We also present numerical experiments in support of our theoretical claims.'
volume: 242
URL: https://proceedings.mlr.press/v242/wang24b.html
PDF: https://proceedings.mlr.press/v242/wang24b/wang24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-wang24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tao
family: Wang
- given: Bo
family: Zhao
- given: Sicun
family: Gao
- given: Rose
family: Yu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 453-465
id: wang24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 453
lastpage: 465
published: 2024-06-11 00:00:00 +0000
- title: 'Signatures meet dynamic programming: Generalizing Bellman equations for trajectory following'
abstract: 'Path signatures have been proposed as a powerful representation of paths that efficiently captures the path’s analytic and geometric characteristics, having useful algebraic properties including fast concatenation of paths through tensor products. Signatures have recently been widely adopted in machine learning problems for time series analysis. In this work we establish connections between value functions typically used in optimal control and intriguing properties of path signatures. These connections motivate our novel control framework with signature transforms that efficiently generalizes the Bellman equation to the space of trajectories. We analyze the properties and advantages of the framework, termed signature control. In particular, we demonstrate that (i) it can naturally deal with varying/adaptive time steps; (ii) it propagates higher-level information more efficiently than value function updates; (iii) it is robust to dynamical system misspecification over long rollouts. As a specific case of our framework, we devise a model predictive control method for path tracking. This method generalizes integral control, being suitable for problems with unknown disturbances. The proposed algorithms are tested in simulation, with differentiable physics models including typical control and robotics tasks such as point-mass, curve following for an ant model, and a robotic manipulator.'
volume: 242
URL: https://proceedings.mlr.press/v242/ohnishi24a.html
PDF: https://proceedings.mlr.press/v242/ohnishi24a/ohnishi24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-ohnishi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Motoya
family: Ohnishi
- given: Iretiayo
family: Akinola
- given: Jie
family: Xu
- given: Ajay
family: Mandlekar
- given: Fabio
family: Ramos
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 466-479
id: ohnishi24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 466
lastpage: 479
published: 2024-06-11 00:00:00 +0000
- title: 'Online decision making with history-average dependent costs'
abstract: 'In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.'
volume: 242
URL: https://proceedings.mlr.press/v242/hebbar24a.html
PDF: https://proceedings.mlr.press/v242/hebbar24a/hebbar24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hebbar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Vijeth
family: Hebbar
- given: Cedric
family: Langbort
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 480-491
id: hebbar24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 480
lastpage: 491
published: 2024-06-11 00:00:00 +0000
- title: 'Learning-based rigid tube model predictive control'
abstract: 'This paper is concerned with model predictive control (MPC) of discrete-time linear systems subject to bounded additive disturbance and mixed constraints on the state and input, whereas the true disturbance set is unknown. Unlike most existing work on robust MPC, we propose an algorithm incorporating online learning that builds on prior knowledge of the disturbance, i.e., a known but conservative disturbance set. We approximate the true disturbance set at each time step with a parameterised set, which is referred to as a quantified disturbance set, using disturbance realisations. A key novelty is that the parameterisation of these quantified disturbance sets enjoys desirable properties such that the quantified disturbance set and its corresponding rigid tube bounding disturbance propagation can be efficiently updated online. We provide statistical gaps between the true and quantified disturbance sets, based on which, probabilistic recursive feasibility of MPC optimisation problems is discussed. Numerical simulations are provided to demonstrate the efficacy and computational advantages of our proposed algorithm and compare with conventional robust MPC algorithms.'
volume: 242
URL: https://proceedings.mlr.press/v242/gao24a.html
PDF: https://proceedings.mlr.press/v242/gao24a/gao24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yulong
family: Gao
- given: Shuhao
family: Yan
- given: Jian
family: Zhou
- given: Mark
family: Cannon
- given: Alessandro
family: Abate
- given: Karl Henrik
family: Johansson
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 492-503
id: gao24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 492
lastpage: 503
published: 2024-06-11 00:00:00 +0000
- title: 'A data-driven Riccati equation'
abstract: 'Certainty equivalence adaptive controllers are analysed using a “data-driven Riccati equation”, corresponding to the model-free Bellman equation used in Q-learning. The equation depends quadratically on data correlation matrices. This makes it possible to derive simple sufficient conditions for stability and robustness to unmodeled dynamics in adaptive systems. The paper is concluded by short remarks on how the bounds can be used to quantify the interplay between excitation levels and robustness.'
volume: 242
URL: https://proceedings.mlr.press/v242/rantzer24a.html
PDF: https://proceedings.mlr.press/v242/rantzer24a/rantzer24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-rantzer24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Anders
family: Rantzer
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 504-513
id: rantzer24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 504
lastpage: 513
published: 2024-06-11 00:00:00 +0000
- title: 'Nonconvex scenario optimization for data-driven reachability'
abstract: 'Many of the popular reachability analysis methods rely on the existence of system models. When system dynamics are uncertain or unknown, data-driven techniques must be utilized instead. In this paper, we propose an approach to data-driven reachability that provides a probabilistic guarantee of correctness for these systems through nonconvex scenario optimization. We pose the problem of finding reachable sets directly from data as a chance-constrained optimization problem, and present two algorithms for estimating nonconvex reachable sets: (1) through the union of partition cells and (2) through the sum of radial basis functions. Additionally, we investigate numerical examples to demonstrate the capability and applicability of the introduced methods to provide nonconvex reachable set approximations.'
volume: 242
URL: https://proceedings.mlr.press/v242/dietrich24a.html
PDF: https://proceedings.mlr.press/v242/dietrich24a/dietrich24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-dietrich24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Elizabeth
family: Dietrich
- given: Alex
family: Devonport
- given: Murat
family: Arcak
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 514-527
id: dietrich24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 514
lastpage: 527
published: 2024-06-11 00:00:00 +0000
- title: 'Uncertainty quantification and robustification of model-based controllers using conformal prediction'
abstract: 'In modern model-based control frameworks such as model predictive control or model-based reinforcement learning, machine learning has become a ubiquitous class of techniques deployed to improve the accuracy of the dynamics models. By leveraging expressive architectures such as neural networks, these frameworks aim to improve both the model accuracy and the control performance of the system, through the construction of accurate data-driven representations of the system dynamics. Despite achieving significant performance improvements over their non-learning counterparts, there are often little or no guarantees on how these model-based controllers with learned models would perform in the presence of uncertainty. In particular, under the influence of modeling errors, noise and exogenous disturbances, it is challenging to ascertain the accuracy of these learned models. In some cases, constraints may even be violated, rendering the controllers unsafe. In this work, we propose a novel framework that can be applied to a large class of model-based controllers and alleviates the above mentioned issues by robustifying the model-based controllers in an online and modular manner, with provable guarantees on the model accuracy and constraint satisfaction. The framework first deploys conformal prediction to generate finite-sample, provably valid uncertainty regions for the dynamics model in a distribution-free manner. These uncertainty regions are incorporated into the constraints through a dynamic constraint tightening procedure. Together with the formulation of a predictive reference generator, a set of robustified reference trajectories are generated and incorporated into the model-based controller. Using two practical case studies, we demonstrate that our proposed methodology not only produces well-calibrated uncertainty regions that establish the accuracy of the models, but also enables the closed-loop system to satisfy constraints in a robust yet non-conservative manner.'
volume: 242
URL: https://proceedings.mlr.press/v242/chee24a.html
PDF: https://proceedings.mlr.press/v242/chee24a/chee24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-chee24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Kong Yao
family: Chee
- given: Thales C.
family: Silva
- given: M. Ani
family: Hsieh
- given: George J.
family: Pappas
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 528-540
id: chee24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 528
lastpage: 540
published: 2024-06-11 00:00:00 +0000
- title: 'Learning for CasADi: Data-driven Models in Numerical Optimization'
abstract: 'While real-world problems are often challenging to analyze analytically, deep learning excels in modeling complex processes from data. Existing optimization frameworks like CasADi facilitate seamless usage of solvers but face challenges when integrating learned process models into numerical optimizations. To address this gap, we present the Learning for CasADi (L4CasADi) framework, enabling the seamless integration of PyTorch-learned models with CasADi for efficient and potentially hardware-accelerated numerical optimization. The applicability of L4CasADi is demonstrated with two tutorial examples: First, we optimize a fish’s trajectory in a turbulent river for energy efficiency where the turbulent flow is represented by a PyTorch model. Second, we demonstrate how an implicit Neural Radiance Field environment representation can be easily leveraged for optimal control with L4CasADi. L4CasADi, along with examples and documentation, is available under MIT license at https://github.com/Tim-Salzmann/l4casadi'
volume: 242
URL: https://proceedings.mlr.press/v242/salzmann24a.html
PDF: https://proceedings.mlr.press/v242/salzmann24a/salzmann24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-salzmann24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tim
family: Salzmann
- given: Jon
family: Arrizabalaga
- given: Joel
family: Andersson
- given: Marco
family: Pavone
- given: Markus
family: Ryll
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 541-553
id: salzmann24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 541
lastpage: 553
published: 2024-06-11 00:00:00 +0000
- title: 'Neural operators for boundary stabilization of stop-and-go traffic'
abstract: 'This paper introduces a novel approach to PDE boundary control design using neural operators to alleviate stop-and-go traffic instabilities. Our framework leverages neural operators to design control strategies for traffic flow systems. The traffic dynamics are described by the Aw-Rascle- Zhang (ARZ) model, which consists of second-order coupled hyperbolic partial differential equations (PDEs). The backstepping method which involves constructing and solving a backstepping control kernel is widely used for boundary control of such PDE systems, but it requires intensive depth of expertise and can be time-consuming. To overcome these challenges, we present two distinct neural operator (NO) learning schemes aimed at stabilizing the traffic PDE system. The first scheme embeds NO-approximated gain kernels within a predefined backstepping controller, while the second one directly learns a boundary control law. The Lyapunov analysis is conducted to evaluate the stability of the NO-approximated gain kernels and control law. It is proved that the NO-based closed-loop system is practical stable under certain approximation accuracy conditions. To validate the efficacy of the proposed approach, simulations are conducted to compare the performance of the two neural operator controllers with a PDE backstepping controller and a Proportional Integral (PI) controller. While the NO-approximated methods exhibit larger errors compared to the backstepping controller, they consistently outperform the PI controller, demonstrating faster computation speeds across all scenarios. This result suggests that neural operators can significantly expedite and simplify the process of obtaining boundary controllers for freeway traffic stabilization systems.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24c.html
PDF: https://proceedings.mlr.press/v242/zhang24c/zhang24c.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yihuai
family: Zhang
- given: Ruiguo
family: Zhong
- given: Huan
family: Yu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 554-565
id: zhang24c
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 554
lastpage: 565
published: 2024-06-11 00:00:00 +0000
- title: 'Submodular information selection for hypothesis testing with misclassification penalties'
abstract: 'We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis is below a desired bound and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under certain assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.'
volume: 242
URL: https://proceedings.mlr.press/v242/bhargav24a.html
PDF: https://proceedings.mlr.press/v242/bhargav24a/bhargav24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bhargav24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jayanth
family: Bhargav
- given: Mahsa
family: Ghasemi
- given: Shreyas
family: Sundaram
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 566-577
id: bhargav24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 566
lastpage: 577
published: 2024-06-11 00:00:00 +0000
- title: 'Learning and deploying robust locomotion policies with minimal dynamics randomization'
abstract: 'Training Deep Reinforcement Learning (DRL) locomotion policies often require massive amounts of data to converge to the desired behavior. In this regard, simulators provide a cheap and abundant source. For successful sim-to-real transfer, xhaustively engineered approaches such as system identification, dynamics randomization, and domain adaptation are generally employed. As an alternative, we investigate a simple strategy of random force injection (RFI) to perturb system dynamics during training. We show that the application of random forces enables us to emulate dynamics randomization. This allows us to obtain locomotion policies that are robust to variations in system dynamics. We further extend RFI, referred to as extended random force injection (ERFI), by introducing an episodic actuation offset. We demonstrate that ERFI provides additional robustness for variations in system mass offering on average a 53% improved performance over RFI. We also show that ERFI is sufficient to perform a successful sim-to-real transfer on two different quadrupedal platforms, ANYmal C and Unitree A1, even for perceptive locomotion over uneven terrain in outdoor environments.'
volume: 242
URL: https://proceedings.mlr.press/v242/campanaro24a.html
PDF: https://proceedings.mlr.press/v242/campanaro24a/campanaro24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-campanaro24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Luigi
family: Campanaro
- given: Siddhant
family: Gangapurwala
- given: Wolfgang
family: Merkt
- given: Ioannis
family: Havoutis
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 578-590
id: campanaro24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 578
lastpage: 590
published: 2024-06-11 00:00:00 +0000
- title: 'Learning flow functions of spiking systems'
abstract: 'We propose a framework for surrogate modelling of spiking systems. These systems are often described by stiff differential equations with high-amplitude oscillations and multi-timescale dynamics, making surrogate models an attractive tool for system design.We parameterise the flow function of a spiking system in state-space using a recurrent neural network architecture, allowing for a direct continuous-time representation of the state trajectories which is particularly advantageous for this class of systems.The spiking nature of the signals makes for a data-heavy and computationally hard training process, and we describe two methods to mitigate these difficulties. We demonstrate our framework on two conductance-based models of biological neurons.'
volume: 242
URL: https://proceedings.mlr.press/v242/aguiar24a.html
PDF: https://proceedings.mlr.press/v242/aguiar24a/aguiar24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-aguiar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Miguel
family: Aguiar
- given: Amritam
family: Das
- given: Karl H.
family: Johansson
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 591-602
id: aguiar24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 591
lastpage: 602
published: 2024-06-11 00:00:00 +0000
- title: 'Safe learning in nonlinear model predictive control'
abstract: 'A robust Model Predictive Control algorithm is proposed for learning-based control with model represented by an affine combination of basis functions. The online optimization is formulated as a sequence of convex programming problems derived by linearizing concave components of the dynamic model. A tube-based approach ensures satisfaction of constraints on control variables and model states while avoiding conservative bounds on linearization errors. The linear dependence of the model on unknown parameters is exploited to allow safe online parameter adaptation. The resulting algorithm is recursively feasible and provides closed loop stability and performance guarantees. Numerical examples are provided to illustrate the approach.'
volume: 242
URL: https://proceedings.mlr.press/v242/buerger24a.html
PDF: https://proceedings.mlr.press/v242/buerger24a/buerger24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-buerger24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Johannes
family: Buerger
- given: Mark
family: Cannon
- given: Martin
family: Doff-Sotta
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 603-614
id: buerger24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 603
lastpage: 614
published: 2024-06-11 00:00:00 +0000
- title: 'Efficient skill acquisition for insertion tasks in obstructed environments'
abstract: 'Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a robust system for efficient skill acquisition designed to address complex insertion tasks in obstructed environments. Our system leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Particularly, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.'
volume: 242
URL: https://proceedings.mlr.press/v242/yamada24a.html
PDF: https://proceedings.mlr.press/v242/yamada24a/yamada24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-yamada24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jun
family: Yamada
- given: Jack
family: Collins
- given: Ingmar
family: Posner
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 615-627
id: yamada24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 615
lastpage: 627
published: 2024-06-11 00:00:00 +0000
- title: 'Balanced reward-inspired reinforcement learning for autonomous vehicle racing'
abstract: 'Autonomous vehicle racing has attracted extensive interest due to its great potential in autonomous driving at the extreme limits. Model-based and learning-based methods are being widely used in autonomous racing. However, model-based methods cannot cope with the dynamic environments when only local perception is available. As a comparison, learning-based methods can handle complex environments under local perception. Recently, deep reinforcement learning (DRL) has gained popularity in autonomous racing. DRL outperforms conventional learning- based methods by handling complex situations and leveraging local information. DRL algorithms, such as the proximal policy algorithm, can achieve a good balance between the execution time and safety in autonomous vehicle competition. However, the training outcomes of conventional DRL methods exhibit inconsistent correctness in decision-making. The instability in decision-making introduces safety concerns in autonomous vehicle racing, such as collisions into track boundaries. The proposed algorithm is capable to avoid collisions and improve the training quality. Simulation results on a physical engine demonstrate that the proposed algorithm outperforms other DRL algorithms in achieving safer control during sharp bends, fewer collisions into track boundaries, and higher training quality among multiple tracks.'
volume: 242
URL: https://proceedings.mlr.press/v242/tian24a.html
PDF: https://proceedings.mlr.press/v242/tian24a/tian24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-tian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zhen
family: Tian
- given: Dezong
family: Zhao
- given: Zhihao
family: Lin
- given: David
family: Flynn
- given: Wenjing
family: Zhao
- given: Daxin
family: Tian
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 628-640
id: tian24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 628
lastpage: 640
published: 2024-06-11 00:00:00 +0000
- title: 'An invariant information geometric method for high-dimensional online optimization'
abstract: 'Sample efficiency is crucial in optimization, particularly in black-box scenarios characterized by expensive evaluations and zeroth-order feedback. When computing resources are plentiful, Bayesian optimization is often favored over evolution strategies. In this paper, we introduce a full invariance oriented evolution strategies algorithm, derived from its corresponding framework, that effectively rivals the leading Bayesian optimization method in tasks with dimensions at the upper limit of Bayesian capability. Specifically, we first build the framework InvIGO that fully incorporates historical information while retaining the full invariant and computational complexity. We then exemplify InvIGO on multi-dimensional Gaussian, which gives an invariant and scalable optimizer SynCMA . The theoretical behavior and advantages of our algorithm over other Gaussian-based evolution strategies are further analyzed. Finally, We benchmark SynCMA against leading algorithms in Bayesian optimization and evolution strategies on various high dimension tasks, including Mujoco locomotion tasks, rover planning task and synthetic functions. In all scenarios, SynCMA demonstrates great competence, if not dominance, over other algorithms in sample efficiency, showing the underdeveloped potential of property oriented evolution strategies.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24d.html
PDF: https://proceedings.mlr.press/v242/zhang24d/zhang24d.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zhengfei
family: Zhang
- given: Yunyue
family: Wei
- given: Yanan
family: Sui
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 641-653
id: zhang24d
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 641
lastpage: 653
published: 2024-06-11 00:00:00 +0000
- title: 'On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up'
abstract: 'We revisit the inverted pendulum problem with the goal of understanding and computing the true optimal value function. We start with an observation that the true optimal value function must be nonsmooth (i.e., not globally C1) due to symmetry of the problem. We then give a result that can certify the optimality of a candidate piece-wise C1 value function. Further, for a candidate value function obtained via numerical approximation, we provide a bound of suboptimality based on its Hamilton-Jacobi-Bellman (HJB) equation residuals. Inspired by Holzhüter (2004), we then design an algorithm that solves backwards the Pontryagin’s minimum principle (PMP) ODE from terminal conditions provided by the locally optimal LQR value function. This numerical procedure leads to a piece-wise C1 value function whose nonsmooth region contains periodic spiral lines and smooth regions attain HJB residuals about $10^{-4}$, hence certiﬁed to be the optimal value function up to minor numerical inaccuracies. This optimal value function checks the power of optimality: (i) it sits above a polynomial lower bound; (ii) its induced controller globally swings up and stabilizes the pendulum, and (iii) attains lower trajectory cost than baseline methods such as energy shaping, model predictive control (MPC), and proximal policy optimization (with MPC attaining almost the same cost). We conclude by distilling the optimal value function into a simple neural network.'
volume: 242
URL: https://proceedings.mlr.press/v242/han24a.html
PDF: https://proceedings.mlr.press/v242/han24a/han24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-han24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Haoyu
family: Han
- given: Heng
family: Yang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 654-666
id: han24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 654
lastpage: 666
published: 2024-06-11 00:00:00 +0000
- title: 'Data-driven robust covariance control for uncertain linear systems'
abstract: 'The theory of covariance control and covariance steering (CS) deals with controlling the dispersion of trajectories of a dynamical system, under the implicit assumption that accurate prior knowledge of the system being controlled is available. In this work, we consider the problem of steering the distribution of a discrete-time, linear system subject to exogenous disturbances under an unknown dynamics model. Leveraging concepts from behavioral systems theory, the trajectories of this unknown, noisy system may be (approximately) represented using system data collected through experimentation. Using this fact, we formulate a direct data-driven covariance control problem using input-state data. We then propose a maximum likelihood uncertainty quantification method to estimate and bound the noise realizations in the data collection process. Lastly, we utilize robust convex optimization techniques to solve the resulting norm-bounded uncertain convex program. We illustrate the proposed end-to-end data-driven CS algorithm on a double integrator example and showcase the efficacy and accuracy of the proposed method compared to that of model-based methods.'
volume: 242
URL: https://proceedings.mlr.press/v242/pilipovsky24a.html
PDF: https://proceedings.mlr.press/v242/pilipovsky24a/pilipovsky24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-pilipovsky24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Joshua
family: Pilipovsky
- given: Panagiotis
family: Tsiotras
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 667-678
id: pilipovsky24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 667
lastpage: 678
published: 2024-06-11 00:00:00 +0000
- title: 'Combining model-based controller and ML advice via convex reparameterization'
abstract: 'Machine Learning (ML) based control, particularly Reinforcement Learning (RL), has achieved impressive advancements but is often black-box and lacks worst-case guarantees in safety-critical systems. In contrast, classical model-based control offers stability guarantees but usually underperforms the machine-learned black-box controller. This motivates us to combine machine-learned black-box and model-based controllers. Due to the nonconvexity of the space of stable controllers, a simple convex combination of the two controllers can lead to instability. We propose using Disturbance Response Control (DRC) to reparameterize the two controllers, ensuring the convexity of the stable controller space. We then propose lambdaCLEAC, which adaptively combines the machine-learned black-box controller and the model-based controller in the DRC parameterization. We prove that our approach achieves the best of both worlds: stability as in model-based control and similar regret bounds as the machine-learned controller.'
volume: 242
URL: https://proceedings.mlr.press/v242/shen24a.html
PDF: https://proceedings.mlr.press/v242/shen24a/shen24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-shen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Junxuan
family: Shen
- given: Adam
family: Wierman
- given: Guannan
family: Qu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 679-693
id: shen24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 679
lastpage: 693
published: 2024-06-11 00:00:00 +0000
- title: 'Pointwise-in-time diagnostics for reinforcement learning during training and runtime'
abstract: 'Explainable AI Planning (XAIP), a subfield of xAI, offers a variety of methods to interpret the behavior of autonomous systems. A recent “pointwise-in-time” explanation method, called Rule Status Assessment (RSA), characterizes an agent’s behavior at individual time steps in a trajectory using linear temporal logic (LTL) rules. In this work, RSA is applied for the first time in a reinforcement learning (RL) context. We first demonstrate RSA diagnostics as a substantial supplement to the basic RL reward curve, tracking whether and when specified subtasks are accomplished. We then introduce a novel “Interactive RSA” which provides the user with detailed diagnostic information automatically at any desired point in a trajectory. We apply RSA to an advanced agent at runtime and show that RSA and its novel interactive variant constitute a promising step towards explainable RL.'
volume: 242
URL: https://proceedings.mlr.press/v242/brindise24a.html
PDF: https://proceedings.mlr.press/v242/brindise24a/brindise24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-brindise24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Noel
family: Brindise
- given: Andres Posada
family: Moreno
- given: Cedric
family: Langbort
- given: Sebastian
family: Trimpe
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 694-706
id: brindise24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 694
lastpage: 706
published: 2024-06-11 00:00:00 +0000
- title: 'Expert with clustering: Hierarchical online preference learning framework'
abstract: 'Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we thus consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users within a population. We therefore introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids, thereby enhancing the performance of recommendation systems. In a recommendation scenario with $N${users}, $T${rounds} per user, and $K${options}, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. The algorithm performs with low regret, especially when a latent hierarchical structure exists among users. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhou24a.html
PDF: https://proceedings.mlr.press/v242/zhou24a/zhou24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhou24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tianyue
family: Zhou
- given: Jung-Hoon
family: Cho
- given: Babak Rahimi
family: Ardabili
- given: Hamed
family: Tabkhi
- given: Cathy
family: Wu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 707-718
id: zhou24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 707
lastpage: 718
published: 2024-06-11 00:00:00 +0000
- title: 'Verification of neural reachable tubes via scenario optimization and conformal prediction'
abstract: 'Learning-based approaches for controlling safety-critical autonomous systems are rapidly growing in popularity; thus, it is important to provide rigorous and robust assurances on their performance and safety. Hamilton-Jacobi (HJ) reachability analysis is a popular formal verification tool for providing such guarantees, since it can handle general nonlinear system dynamics, bounded adversarial system disturbances, and state and input constraints. However, it involves solving a Partial Differential Equation (PDE), whose computational and memory complexity scales exponentially with respect to the state dimension, making its direct use on large-scale systems intractable. To overcome this challenge, neural approaches, such as DeepReach, have been used to synthesize reachable tubes and safety controllers for high-dimensional systems. However, verifying these neural reachable tubes remains challenging. In this work, we propose two different verification methods, based on robust scenario optimization and conformal prediction, to provide probabilistic safety guarantees for neural reachable tubes. Our methods allow a direct trade-off between resilience to outlier errors in the neural tube, which are inevitable in a learning-based approach, and the strength of the probabilistic safety guarantee. Furthermore, we show that split conformal prediction, a widely used method in the machine learning community for uncertainty quantification, reduces to a scenario-based approach, making the two methods equivalent not only for verification of neural reachable tubes but also more generally. To our knowledge, our proof is the first in the literature to show a strong relationship between the highly related but disparate fields of conformal prediction and scenario optimization. Finally, we propose an outlier-adjusted verification approach that harnesses information about the error distribution in neural reachable tubes to recover greater safe volumes. We demonstrate the efficacy of the proposed approaches for the high-dimensional problems of multi-vehicle collision avoidance and rocket landing with no-go zones.'
volume: 242
URL: https://proceedings.mlr.press/v242/lin24a.html
PDF: https://proceedings.mlr.press/v242/lin24a/lin24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Albert
family: Lin
- given: Somil
family: Bansal
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 719-731
id: lin24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 719
lastpage: 731
published: 2024-06-11 00:00:00 +0000
- title: 'Random features approximation for control-affine systems'
abstract: 'Modern data-driven control applications call for flexible nonlinear models that are amenable to principled controller synthesis and realtime feedback. Many nonlinear dynamical systems of interest are control affine. We propose two novel classes of nonlinear feature representations which capture control affine structure while allowing for arbitrary complexity in the state dependence. Our methods make use of random features (RF) approximations, inheriting the expressiveness of kernel methods at a lower computational cost. We formalize the representational capabilities of our methods by showing their relationship to the Affine Dot Product (ADP) kernel proposed by Castaneda et al. (2021) and a novel Affine Dense (AD) kernel that we introduce. We further illustrate the utility by presenting a case study of data-driven optimization-based control using control certificate functions (CCF). Simulation experiments on a double pendulum empirically demonstrate the advantages of our methods.'
volume: 242
URL: https://proceedings.mlr.press/v242/kazemian24a.html
PDF: https://proceedings.mlr.press/v242/kazemian24a/kazemian24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kazemian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Kimia
family: Kazemian
- given: Yahya
family: Sattar
- given: Sarah
family: Dean
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 732-744
id: kazemian24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 732
lastpage: 744
published: 2024-06-11 00:00:00 +0000
- title: 'Hacking predictors means hacking cars: Using sensitivity analysis to identify trajectory prediction vulnerabilities for autonomous driving security'
abstract: 'Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The analysis reveals that between all inputs, almost all of the perturbation sensitivities for both models lie only within the most recent position and velocity states. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models, revealing that these trajectory predictors are, in fact, susceptible to image-based attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how these attacks can cause a vehicle to come to a sudden stop from moderate driving speeds.'
volume: 242
URL: https://proceedings.mlr.press/v242/gibson24a.html
PDF: https://proceedings.mlr.press/v242/gibson24a/gibson24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gibson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Marsalis
family: Gibson
- given: David
family: Babazadeh
- given: Claire
family: Tomlin
- given: Shankar
family: Sastry
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 745-757
id: gibson24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 745
lastpage: 757
published: 2024-06-11 00:00:00 +0000
- title: 'Rademacher complexity of neural ODEs via Chen-Fliess series'
abstract: 'We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen-Fliess series expansion for nonlinear ODEs. In this net, the output “weights” are taken from the signature of the control input — a tool used to represent infinite-dimensional paths as a sequence of tensors — which comprises iterated integrals of the control input over a simplex. The “features” are taken to be iterated Lie derivatives of the output function with respect to the vector fields in the controlled ODE model. The main result of this work applies this framework to derive compact expressions for the Rademacher complexity of ODE models that map an initial condition to a scalar output at some terminal time. The result leverages the straightforward analysis afforded by single-layer architectures. We conclude with some examples instantiating the bound for some specific systems and discuss potential follow-up work.'
volume: 242
URL: https://proceedings.mlr.press/v242/hanson24a.html
PDF: https://proceedings.mlr.press/v242/hanson24a/hanson24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hanson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Joshua
family: Hanson
- given: Maxim
family: Raginsky
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 758-769
id: hanson24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 758
lastpage: 769
published: 2024-06-11 00:00:00 +0000
- title: 'Robust cooperative multi-agent reinforcement learning: A mean-field type game perspective'
abstract: 'In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of stochastic and non-stochastic uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainties, we formulate the problem as a worst-case (minimax) framework. Since this problem is intractable in general, we focus on the Linear Quadratic setting to enable derive benchmark solutions. First, since no standard theory exists for this problem due to the distributed information structure, we utilize the Mean-Field Type Game (MFTG) paradigm to establish guarantees on the solution quality in the sense of achieved Nash equilibrium of the MFTG. This in turn allows us to compare the performance against the corresponding original robust multi-agent control problem. Then, we propose a Receding-horizon Gradient Descent Ascent RL algorithm to find the MFTG Nash equilibrium and we prove a non-asymptotic rate of convergence. Finally, we provide numerical experiments to demonstrate the efficacy of our approach relative to a baseline algorithm.'
volume: 242
URL: https://proceedings.mlr.press/v242/zaman24a.html
PDF: https://proceedings.mlr.press/v242/zaman24a/zaman24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zaman24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Muhammad Aneeq Uz
family: Zaman
- given: Mathieu
family: Laurière
- given: Alec
family: Koppel
- given: Tamer
family: Başar
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 770-783
id: zaman24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 770
lastpage: 783
published: 2024-06-11 00:00:00 +0000
- title: 'Learning $\epsilon$-Nash equilibrium stationary policies in stochastic games with unknown independent chains using online mirror descent'
abstract: 'We study a subclass of n-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players’ decisions are coupled through their payoff functions. We assume players can receive only realizations of their payoffs, and that the players can not observe the states and actions of other players, nor do they know the transition probability matrices of their own Markov chain. Relying on a compact dual formulation of the game based on occupancy measures and the technique of confidence set to maintain high-probability estimates of the unknown transition matrices, we propose a fully decentralized mirror descent algorithm to learn an $\epsilon$-Nash equilibrium stationary policy for this class of games. The proposed algorithm has the desired properties of independence and convergence. Specifically, assuming the existence of a variationally stable Nash equilibrium policy, we show that the proposed algorithm in which players make their decisions independently and in a decentralized fashion converges asymptotically to the stable $\epsilon$-Nash equilibrium stationary policy with arbitrarily high probability.'
volume: 242
URL: https://proceedings.mlr.press/v242/qin24a.html
PDF: https://proceedings.mlr.press/v242/qin24a/qin24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-qin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tiancheng
family: Qin
- given: S. Rasoul
family: Etesami
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 784-795
id: qin24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 784
lastpage: 795
published: 2024-06-11 00:00:00 +0000
- title: 'Uncertainty informed optimal resource allocation with Gaussian process based Bayesian inference'
abstract: 'We focus on the problem of uncertainty informed allocation of medical resources (vaccines) to heterogeneous populations for managing epidemic spread. We tackle two related questions: (1) For a compartmental ordinary differential equation (ODE) model of epidemic spread, how can we estimate and integrate parameter uncertainty into resource allocation decisions? (2) How can we computationally handle both nonlinear ODE constraints and parameter uncertainties for a generic stochastic optimization problem for resource allocation? To the best of our knowledge current literature does not fully resolve these questions. Here, we develop a data-driven approach to represent parameter uncertainty accurately and tractably in a novel stochastic optimization problem formulation. We first generate a tractable scenario set by estimating the distribution on ODE model parameters using Bayesian inference with Gaussian processes. Next, we develop a parallelized solution algorithm that accounts for scenario-dependent nonlinear ODE constraints. Our scenario-set generation procedure and solution approach are flexible in that they can handle any compartmental epidemiological ODE model. Our computational experiments on two different non-linear ODE models (SEIR and SEPIHR) indicate that accounting for uncertainty in key epidemiological parameters can improve the efficacy of time-critical allocation decisions by 4-8%. This improvement can be attributed to data-driven and optimal (strategic) nature of vaccine allocations.'
volume: 242
URL: https://proceedings.mlr.press/v242/gupta24a.html
PDF: https://proceedings.mlr.press/v242/gupta24a/gupta24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gupta24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Samarth
family: Gupta
- given: Saurabh
family: Amin
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 796-812
id: gupta24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 796
lastpage: 812
published: 2024-06-11 00:00:00 +0000
- title: 'Improving sample efficiency of high dimensional Bayesian optimization with MCMC'
abstract: 'Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.'
volume: 242
URL: https://proceedings.mlr.press/v242/yi24a.html
PDF: https://proceedings.mlr.press/v242/yi24a/yi24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-yi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zeji
family: Yi
- given: Yunyue
family: Wei
- given: Chu Xin
family: Cheng
- given: Kaibo
family: He
- given: Yanan
family: Sui
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 813-824
id: yi24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 813
lastpage: 824
published: 2024-06-11 00:00:00 +0000
- title: 'SpOiLer: Offline reinforcement learning using scaled penalties'
abstract: 'Offline Reinforcement Learning (RL) is a variant of off-policy learning where an optimal policy must be learned from a static dataset containing trajectories collected by an unknown behavior policy. In the offline setting, standard off-policy algorithms will overestimate values of out-of-distribution actions and a policy trained naively in this way will perform poorly in the environment due to distribution shift between the implied and real environment; this is especially likely when modelling complex and multi-modal data distributions. We propose Scaled-penalty Offline Learning (SpOiLer), an offline reinforcement learning algorithm that reduces the value of out-of-distribution actions relative to observed actions. The resultant pessimistic value function is a lower bound of the true value function and manipulates the policy towards selecting actions present in the dataset. Our method is a simple augmentation to the standard Bellman backup operator and implementation requires around 15 additional lines of code over soft actor-critic. We provide theoretical insights into how SpOiLer operates under the hood and show empirically that SpOiLer achieves remarkable performance against prior methods on a range of tasks.'
volume: 242
URL: https://proceedings.mlr.press/v242/srinivasan24a.html
PDF: https://proceedings.mlr.press/v242/srinivasan24a/srinivasan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-srinivasan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Padmanaba
family: Srinivasan
- given: William J.
family: Knottenbelt
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 825-838
id: srinivasan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 825
lastpage: 838
published: 2024-06-11 00:00:00 +0000
- title: 'Towards safe multi-task Bayesian optimization'
abstract: 'Bayesian optimization has emerged as a highly effective tool for the safe online optimization of systems, due to its high sample efficiency and noise robustness. To further enhance its efficiency, reduced physical models of the system can be incorporated into the optimization process, accelerating it. These models are able to offer an approximation of the actual system, and evaluating them is significantly cheaper. The similarity between the model and reality is represented by additional hyperparameters, which are learned within the optimization process. Safety is a crucial criterion for online optimization methods such as Bayesian optimization, which has been addressed by recent works that provide safety guarantees under the assumption of known hyperparameters. In practice, however, this does not apply. Therefore, we extend the robust Gaussian process uniform error bounds to meet the multi-task setting, which involves the calculation of a confidence region from the hyperparameter posterior distribution utilizing Markov chain Monte Carlo methods. Subsequently, the robust safety bounds are employed to facilitate the safe optimization of the system, while incorporating measurements of the models. Simulation results indicate that the optimization can be significantly accelerated for expensive to evaluate functions in comparison to other state-of-the-art safe Bayesian optimization methods, contingent on the fidelity of the models.'
volume: 242
URL: https://proceedings.mlr.press/v242/lubsen24a.html
PDF: https://proceedings.mlr.press/v242/lubsen24a/lubsen24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lubsen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jannis
family: Lübsen
- given: Christian
family: Hespe
- given: Annika
family: Eichler
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 839-851
id: lubsen24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 839
lastpage: 851
published: 2024-06-11 00:00:00 +0000
- title: 'Mixing classifiers to alleviate the accuracy-robustness trade-off'
abstract: 'Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing neural models often suffer from a trade-off between accuracy and adversarial robustness, which is a limitation that must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both of these base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $l_p$ radius on an input can result in misclassification of the mixed classifier.'
volume: 242
URL: https://proceedings.mlr.press/v242/bai24a.html
PDF: https://proceedings.mlr.press/v242/bai24a/bai24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yatong
family: Bai
- given: Brendon G.
family: Anderson
- given: Somayeh
family: Sojoudi
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 852-865
id: bai24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 852
lastpage: 865
published: 2024-06-11 00:00:00 +0000
- title: 'Design of observer-based finite-time control for inductively coupled power transfer system with random gain fluctuations'
abstract: 'This investigation focuses on the issues of finite-time stochastic stabilisation and non fragile control design for inductively coupled power transfer systems (ICPTSs) in the presence of stochastic disturbances. Primarily, the observer system exploits the information obtained from the output of the ICPTSs to accurately reconstruct the states of the ICPTS. The observer-based non fragile control is put forward by including the estimated states of the system and gain fluctuations, which assist in achieving the desired finite-time stochastic stabilisation of the addressed system. Furthermore, via the use of Lyapunov stability theory and Ito’s formula, conditions based on linear matrix inequalities are derived, which serve as adequate criteria for affirming the desired results. In conclusion, the results of the simulation that have been offered provide evidence that the proposed theoretical outcomes and control system are viable propositions.'
volume: 242
URL: https://proceedings.mlr.press/v242/thangavel24a.html
PDF: https://proceedings.mlr.press/v242/thangavel24a/thangavel24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-thangavel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Satheesh
family: Thangavel
- given: Sakthivel
family: Rathinasamy
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 866-875
id: thangavel24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 866
lastpage: 875
published: 2024-06-11 00:00:00 +0000
- title: 'Learning robust policies for uncertain parametric Markov decision processes'
abstract: 'Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.'
volume: 242
URL: https://proceedings.mlr.press/v242/rickard24a.html
PDF: https://proceedings.mlr.press/v242/rickard24a/rickard24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-rickard24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Luke
family: Rickard
- given: Alessandro
family: Abate
- given: Kostas
family: Margellos
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 876-889
id: rickard24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 876
lastpage: 889
published: 2024-06-11 00:00:00 +0000
- title: 'Conditions for parameter unidentifiability of linear ARX systems for enhancing security'
abstract: 'For an adversarial observer of parametric systems, the identifiability of parameters reflects the possibility of inferring the system dynamics and then affects the performance of attacks against the systems. Hence, achieving unidentifiability of the parameters, which makes the adversary unable to get identification with low variance, is an attractive way to enhance security. In this paper, we propose a quantitative definition to measure the unidentifiability based on the lower bound of identification variance. The lower bound is given via the analysis of the Fisher Information Matrix (FIM). Then, we propose the necessary and sufficient condition for unidentifiability and derive the explicit form of the unidentifiability condition for linear autoregressive systems with exogenous inputs (ARX systems). It is proved that the unidentifiability of linear ARX systems can be achieved through quadratic constraints on inputs and outputs. Finally, considering an optimal control problem with security concerns, we apply the unidentifiability constraint and obtain the optimal controller. Simulations demonstrate the effectiveness of our method.'
volume: 242
URL: https://proceedings.mlr.press/v242/mao24b.html
PDF: https://proceedings.mlr.press/v242/mao24b/mao24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-mao24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Xiangyu
family: Mao
- given: Jianping
family: He
- given: Chengpu
family: Yu
- given: Chongrong
family: Fang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 890-901
id: mao24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 890
lastpage: 901
published: 2024-06-11 00:00:00 +0000
- title: 'Meta-learning linear quadratic regulators: a policy gradient MAML approach for model-free LQR'
abstract: 'We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.'
volume: 242
URL: https://proceedings.mlr.press/v242/toso24a.html
PDF: https://proceedings.mlr.press/v242/toso24a/toso24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-toso24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Leonardo Felipe
family: Toso
- given: Donglin
family: Zhan
- given: James
family: Anderson
- given: Han
family: Wang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 902-915
id: toso24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 902
lastpage: 915
published: 2024-06-11 00:00:00 +0000
- title: 'A large deviations perspective on policy gradient algorithms'
abstract: 'Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations.'
volume: 242
URL: https://proceedings.mlr.press/v242/jongeneel24a.html
PDF: https://proceedings.mlr.press/v242/jongeneel24a/jongeneel24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-jongeneel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Wouter
family: Jongeneel
- given: Daniel
family: Kuhn
- given: Mengmeng
family: Li
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 916-928
id: jongeneel24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 916
lastpage: 928
published: 2024-06-11 00:00:00 +0000
- title: 'Deep model-free KKL observer: A switching approach'
abstract: 'This paper presents a new model-free methodology to learn Kazantzis-Kravaris-Luenberger (KKL) observers for nonlinear systems. We address three major difficulties arising in observer design: the peaking phenomenon, the noise sensitivity and the trade-off between convergence speed and robustness. We formulate the learning objective as an optimization problem, strictly minimizing the error of the observer estimates, without the need of adding explicit constraints or regularization terms. We further improve the performance with a switching approach, efficiently transitioning between two observers, respectively designed for the transient phase and the asymptotic convergence. Numerical results on the Van der Pol system, the Rössler attractor and on a bioreactor illustrate the gain of the method regarding the literature, in term of performance and robustness. Code available online: https://github.com/jolindien-git/DeepKKL'
volume: 242
URL: https://proceedings.mlr.press/v242/peralez24a.html
PDF: https://proceedings.mlr.press/v242/peralez24a/peralez24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-peralez24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Johan
family: Peralez
- given: Madiha
family: Nadri
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 929-940
id: peralez24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 929
lastpage: 940
published: 2024-06-11 00:00:00 +0000
- title: 'In vivo learning-based control of microbial populations density in bioreactors'
abstract: 'A key problem in using microorganisms as bio-factories is achieving and maintaining cellular communities at the desired density and composition to efficiently convert their biomass into useful compounds. Bioreactors are promising technological platforms for the real-time, scalable control of cellular density. In this work, we developed a learning-based strategy to expand the range of available control algorithms capable of regulating the density of a single bacterial population in bioreactors. Specifically, we used a sim-to-real paradigm, where a simple mathematical model, calibrated using a single experiment, was adopted to generate synthetic data for training the controller. The resulting policy was then exhaustively tested in vivo using a low-cost bioreactor known as Chi.Bio, assessing performance and robustness. Additionally, we compared the performance with more traditional controllers (namely, a PI and an MPC), confirming that the learning-based controller exhibits similar performance in vivo. Our work demonstrates the viability of learning-based strategies for controlling cellular density in bioreactors, making a step forward toward their use in controlling the composition of microbial consortia.'
volume: 242
URL: https://proceedings.mlr.press/v242/brancato24a.html
PDF: https://proceedings.mlr.press/v242/brancato24a/brancato24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-brancato24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Sara Maria
family: Brancato
- given: Davide
family: Salzano
- given: Francesco De
family: Lellis
- given: Davide
family: Fiore
- given: Giovanni
family: Russo
- given: Mario di
family: Bernardo
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 941-953
id: brancato24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 941
lastpage: 953
published: 2024-06-11 00:00:00 +0000
- title: 'Bounded robustness in reinforcement learning via lexicographic objectives'
abstract: 'Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.'
volume: 242
URL: https://proceedings.mlr.press/v242/jarne-ornia24a.html
PDF: https://proceedings.mlr.press/v242/jarne-ornia24a/jarne-ornia24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-jarne-ornia24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Daniel
family: Jarne Ornia
- given: Licio
family: Romao
- given: Lewis
family: Hammond
- given: Manuel Mazo
family: Jr
- given: Alessandro
family: Abate
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 954-967
id: jarne-ornia24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 954
lastpage: 967
published: 2024-06-11 00:00:00 +0000
- title: 'System-level safety guard: Safe tracking control through uncertain neural network dynamics models'
abstract: 'The Neural Network (NN), as a black-box function approximator, has been considered in many control and robotics applications. However, difficulties in verifying the overall system safety in the presence of uncertainties hinder the deployment of NN modules in safety-critical systems. In this paper, we leverage the NNs as predictive models for trajectory tracking of unknown dynamical systems. We consider controller design in the presence of both intrinsic uncertainty and uncertainties from other system modules. In this setting, we formulate the constrained trajectory tracking problem and show that it can be solved using Mixed-integer Linear Programming (MILP). The proposed MILP-based approach is empirically demonstrated in robot navigation and obstacle avoidance through simulations. The demonstration videos are available at https://xiaolisean.github.io/publication/2023-11-01-L4DC2024.'
volume: 242
URL: https://proceedings.mlr.press/v242/li24a.html
PDF: https://proceedings.mlr.press/v242/li24a/li24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-li24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Xiao
family: Li
- given: Yutong
family: Li
- given: Anouck
family: Girard
- given: Ilya
family: Kolmanovsky
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 968-979
id: li24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 968
lastpage: 979
published: 2024-06-11 00:00:00 +0000
- title: 'Nonasymptotic regret analysis of adaptive linear quadratic control with model misspecification'
abstract: 'The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $\delta T$, where $\delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.'
volume: 242
URL: https://proceedings.mlr.press/v242/lee24a.html
PDF: https://proceedings.mlr.press/v242/lee24a/lee24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lee24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Bruce
family: Lee
- given: Anders
family: Rantzer
- given: Nikolai
family: Matni
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 980-992
id: lee24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 980
lastpage: 992
published: 2024-06-11 00:00:00 +0000
- title: 'Error bounds, PL condition, and quadratic growth for weakly convex functions, and linear convergences of proximal point methods'
abstract: 'Many machine learning problems lack strong convexity properties. Fortunately, recent studies have revealed that first-order algorithms also enjoy linear convergences under various weaker regularity conditions. While the relationship among different conditions for convex and smooth functions is well understood, it is not the case for the nonsmooth setting. In this paper, we go beyond convexity and smoothness, and clarify the connections among common regularity conditions (including strong convexity, restricted secant inequality, subdifferential error bound, Polyak-Łojasiewic inequality, and quadratic growth) in the class of weakly convex functions. In addition, we present a simple and modular proof for the linear convergence of the proximal point method (PPM) for convex (possibly nonsmooth) optimization using these regularity conditions. The linear convergence also holds when the subproblems of PPM are solved inexactly with a proper control of inexactness.'
volume: 242
URL: https://proceedings.mlr.press/v242/liao24a.html
PDF: https://proceedings.mlr.press/v242/liao24a/liao24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-liao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Feng-Yi
family: Liao
- given: Lijun
family: Ding
- given: Yang
family: Zheng
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 993-1005
id: liao24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 993
lastpage: 1005
published: 2024-06-11 00:00:00 +0000
- title: 'Parameterized fast and safe tracking (FaSTrack) using DeepReach'
abstract: 'Fast and Safe Tracking (FaSTrack) is a modular framework that provides safety guarantees while planning and executing trajectories in real time via value functions of Hamilton-Jacobi (HJ) reachability. These value functions are computed through dynamic programming, which is notorious for being computationally inefficient. Moreover, the resulting trajectory does not adapt online to the environment, such as sudden disturbances or obstacles. DeepReach is a scalable deep learning method to HJ reachability that allows parameterization of states, which opens up possibilities for online adaptation to various controls and disturbances. In this paper, we propose Parametric FaSTrack, which uses DeepReach to approximate a value function that parameterizes the control bounds of the planning model. The new framework can smoothly trade off between the navigation speed and the tracking error (therefore maneuverability) while guaranteeing obstacle avoidance in a priori unknown environments. We demonstrate our method through two examples and a benchmark comparison with existing methods, showing the safety, efficiency, and faster solution times of the framework.'
volume: 242
URL: https://proceedings.mlr.press/v242/jeong24a.html
PDF: https://proceedings.mlr.press/v242/jeong24a/jeong24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-jeong24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Hyun Joe
family: Jeong
- given: Zheng
family: Gong
- given: Somil
family: Bansal
- given: Sylvia
family: Herbert
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1006-1017
id: jeong24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1006
lastpage: 1017
published: 2024-06-11 00:00:00 +0000
- title: 'Probabilistic ODE solvers for integration error-aware numerical optimal control'
abstract: 'Appropriate time discretization is crucial for real-time applications of numerical optimal control, such as nonlinear model predictive control. However, if the discretization error strongly depends on the applied control input, meeting accuracy and sampling time requirements simultaneously can be challenging using classical discretization methods. In particular, neither fixed-grid nor adaptive-grid discretizations may be suitable, when they suffer from large integration error or exceed the prescribed sampling time, respectively. In this work, we take a first step at closing this gap by utilizing probabilistic numerical integrators to approximate the solution of the initial value problem, as well as the computational uncertainty associated with it, inside the optimal control problem (OCP). By taking the viewpoint of probabilistic numerics and propagating the numerical uncertainty in the cost, the OCP is reformulated such that the optimal input reduces the computational uncertainty insofar as it is beneficial for the control objective. The proposed approach is illustrated using a numerical example, and potential benefits and limitations are discussed.'
volume: 242
URL: https://proceedings.mlr.press/v242/lahr24a.html
PDF: https://proceedings.mlr.press/v242/lahr24a/lahr24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lahr24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Amon
family: Lahr
- given: Filip
family: Tronarp
- given: Nathanael
family: Bosch
- given: Jonathan
family: Schmidt
- given: Philipp
family: Hennig
- given: Melanie N.
family: Zeilinger
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1018-1032
id: lahr24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1018
lastpage: 1032
published: 2024-06-11 00:00:00 +0000
- title: 'Event-triggered safe Bayesian optimization on quadcopters'
abstract: 'Bayesian optimization (BO) has proven to be a powerful tool for automatically tuning control parameters without requiring knowledge of the underlying system dynamics. Safe BO methods, in addition, guarantee safety during the optimization process, assuming that the underlying objective function does not change. However, in real-world scenarios, time-variations frequently occur, for example, due to wear in the system or changes in operation. Utilizing standard safe BO strategies that do not address time-variations can result in failure as previous safe decisions may become unsafe over time, which we demonstrate herein. To address this, we introduce a new algorithm, Event-Triggered SafeOpt (ETSO), which adapts to changes online solely relying on the observed costs. At its core, ETSO uses an event trigger to detect significant deviations between observations and the current surrogate of the objective function. When such change is detected, the algorithm reverts to a safe backup controller, and exploration is restarted. In this way, safety is recovered and maintained across changes. We evaluate ETSO on quadcopter controller tuning, both in simulation and hardware experiments. ETSO outperforms state-of-the-art safe BO, achieving superior control performance over time while maintaining safety.'
volume: 242
URL: https://proceedings.mlr.press/v242/holzapfel24a.html
PDF: https://proceedings.mlr.press/v242/holzapfel24a/holzapfel24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-holzapfel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Antonia
family: Holzapfel
- given: Paul
family: Brunzema
- given: Sebastian
family: Trimpe
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1033-1045
id: holzapfel24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1033
lastpage: 1045
published: 2024-06-11 00:00:00 +0000
- title: 'Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning'
abstract: 'We consider a multi-task learning problem, where an agent is presented a number of $N$ reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.'
volume: 242
URL: https://proceedings.mlr.press/v242/bai24b.html
PDF: https://proceedings.mlr.press/v242/bai24b/bai24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bai24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yitao
family: Bai
- given: Thinh
family: Doan
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1046-1057
id: bai24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1046
lastpage: 1057
published: 2024-06-11 00:00:00 +0000
- title: 'Convergence guarantees for adaptive model predictive control with kinky inference'
abstract: 'We analyze the convergence properties of a robust adaptive model predictive control algorithm used to control an unknown nonlinear system. We show that by employing a standard quadratic stabilizing cost function, and by recursively updating the nominal model through kinky inference, the resulting controller ensures convergence of the true system to the origin, despite the presence of model uncertainty. We illustrate our theoretical findings through a numerical simulation.'
volume: 242
URL: https://proceedings.mlr.press/v242/zuliani24a.html
PDF: https://proceedings.mlr.press/v242/zuliani24a/zuliani24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zuliani24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Riccardo
family: Zuliani
- given: Raffaele
family: Soloperto
- given: John
family: Lygeros
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1058-1070
id: zuliani24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1058
lastpage: 1070
published: 2024-06-11 00:00:00 +0000
- title: 'Convex approximations for a bi-level formulation of data-enabled predictive control'
abstract: 'The Willems’ fundamental lemma, which characterizes linear time invariant (LTI) systems using input and output trajectories, has found many successful applications. Combining this with receding horizon control leads to a popular Data-EnablEd Predictive Control (DeePC) scheme. DeePC is first established for LTI systems and has been extended and applied for practical systems beyond LTI settings. However, the relationship between different DeePC variants, involving regularization and dimension reduction, remains unclear. In this paper, we first discuss a bi-level optimization formulation that combines a data pre-processing step as an inner problem (system identification) and predictive control as an outer problem (online control). We next introduce a series of convex approximations by relaxing some hard constraints in the bi-level optimization as suitable regularization terms, accounting for an implicit identification. These include some existing DeePC variants as well as two new variants, for which we establish their equivalence under appropriate settings. Notably, our analysis reveals a novel variant, called DeePC-SVD-Iter, which has remarkable empirical performance of direct methods on systems beyond deterministic LTI settings.'
volume: 242
URL: https://proceedings.mlr.press/v242/shang24a.html
PDF: https://proceedings.mlr.press/v242/shang24a/shang24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-shang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Xu
family: Shang
- given: Yang
family: Zheng
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1071-1082
id: shang24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1071
lastpage: 1082
published: 2024-06-11 00:00:00 +0000
- title: 'PDE control gym: A benchmark for data-driven boundary control of partial differential equations'
abstract: 'Over the last decade, data-driven methods have surged in popularity, emerging as valuable tools for control theory. As such, neural network approximations of control feedback laws, system dynamics, and even Lyapunov functions have attracted growing attention. With the ascent of learning based control, the need for accurate, fast, and easy-to-use benchmarks has increased. In this work, we present the first learning-based environment for boundary control of PDEs. In our benchmark, we introduce three foundational PDE problems — a 1D transport PDE, a 1D reaction-diffusion PDE, and a 2D Navier–Stokes PDE — whose solvers are bundled in an user-friendly reinforcement learning gym. With this gym, we then present the first set of model-free, reinforcement learning algorithms for solving this series of benchmark problems, achieving stability, although at a higher cost compared to model-based PDE backstepping. With the set of benchmark environments and detailed examples, this work significantly lowers the barrier to entry for learning-based PDE control — a topic largely unexplored by the data-driven control community. The entire benchmark is available on Github along with detailed documentation and the presented reinforcement learning models are open sourced.'
volume: 242
URL: https://proceedings.mlr.press/v242/bhan24a.html
PDF: https://proceedings.mlr.press/v242/bhan24a/bhan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bhan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Luke
family: Bhan
- given: Yuexin
family: Bian
- given: Miroslav
family: Krstic
- given: Yuanyuan
family: Shi
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1083-1095
id: bhan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1083
lastpage: 1095
published: 2024-06-11 00:00:00 +0000
- title: 'Towards bio-inspired control of aerial vehicle: Distributed aerodynamic parameters for state prediction'
abstract: 'In an era where traditional flight control systems are increasingly strained by the demands of modern aerial missions, this research introduces a novel integration of bio-inspired sensing mechanisms into aerial vehicle control systems, aimed at revolutionizing the adaptability and efficiency of UAV operations. Current gust suppression technologies often activate only after disturbances have occurred, highlighting significant limitations in real-time responsiveness and computational efficiency. With a specific emphasis on employing distributed aerodynamic parameters for predicting flight states, the study utilizes a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network to investigate the predictive capabilities of these models under varying conditions, including scenarios with full and limited input data. The models were assessed on their ability to forecast the pitch rate of Unmanned Aerial Vehicles (UAVs), examining both the precision of predictions in response to different historical input sizes and their robustness against simulated sensor noise. Results highlight the potential of using aerodynamic data to enhance the reliability and adaptability of flight control systems, significantly reducing dependency on specific sensor inputs. This approach not only demonstrates the effectiveness of integrating sophisticated machine learning models with aerospace technology but also paves the way for more adaptive, efficient control systems in UAV operations.'
volume: 242
URL: https://proceedings.mlr.press/v242/wang24c.html
PDF: https://proceedings.mlr.press/v242/wang24c/wang24c.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-wang24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yikang
family: Wang
- given: Adolfo
family: Perrusquia
- given: Dmitry
family: Ignatyev
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1096-1106
id: wang24c
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1096
lastpage: 1106
published: 2024-06-11 00:00:00 +0000
- title: 'Residual learning and context encoding for adaptive offline-to-online reinforcement learning'
abstract: 'Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-world applications, such as outdoor construction and navigation over rough terrain, it is common for the transition dynamics to vary between the offline and online phases. Moreover, the dynamics may vary during the online training. To address this problem of changing dynamics from offline to online RL we propose a residual learning approach that infers dynamics changes to correct the outputs of the offline solution. At the online fine-tuning phase, we train a context encoder to learn a representation that is consistent inside the current online learning environment while being able to predict dynamic transitions. Experiments in D4RL MuJoCo environments, modified to support dynamics’ changes upon environment resets, show that our approach can adapt to these dynamic changes and generalize to unseen perturbations in a sample-efficient way, whilst comparison methods cannot.'
volume: 242
URL: https://proceedings.mlr.press/v242/nakhaeinezhadfard24a.html
PDF: https://proceedings.mlr.press/v242/nakhaeinezhadfard24a/nakhaeinezhadfard24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-nakhaeinezhadfard24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Mohammadreza
family: Nakhaei
- given: Aidan
family: Scannell
- given: Joni
family: Pajarinen
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1107-1121
id: nakhaeinezhadfard24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1107
lastpage: 1121
published: 2024-06-11 00:00:00 +0000
- title: 'CoVO-MPC: Theoretical analysis of sampling-based MPC and optimal covariance design'
abstract: 'Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVO-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVO-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at https://tinyurl.com/covo-mpc-cmu.'
volume: 242
URL: https://proceedings.mlr.press/v242/yi24b.html
PDF: https://proceedings.mlr.press/v242/yi24b/yi24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-yi24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zeji
family: Yi
- given: Chaoyi
family: Pan
- given: Guanqi
family: He
- given: Guannan
family: Qu
- given: Guanya
family: Shi
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1122-1135
id: yi24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1122
lastpage: 1135
published: 2024-06-11 00:00:00 +0000
- title: 'Stable modular control via contraction theory for reinforcement learning'
abstract: 'We propose a novel way to integrate control theoretical results with reinforcement learning (RL) for stability, robustness, and generalization: developing modular control architectures via contraction theory to simplify the complex problems. To guarantee control stability for RL, we leverage modularity to deconstruct the nonlinear stability problems into algebraically solvable ones, yielding linear constraints on the input gradients of control networks that can be as simple as switching the signs of network weights. This control architecture can be implemented in general RL frameworks without modifying the algorithms. This minimally invasive way allows arguably easy integration into hierarchical RL, and improves its performance. We realize the modularity by constructing an auxiliary space through coordinate transformation. Within the auxiliary space, system dynamics can be represented as hierarchical combinations of subsystems. These subsystems converge recursively following their hierarchies, provided stable self-feedbacks. We implement this modular control architecture in PPO and hierarchical RL, and demonstrate in simulation (i) the necessity of control stability for robustness and generalization and (ii) the effectiveness in improving hierarchical RL for manipulation learning.'
volume: 242
URL: https://proceedings.mlr.press/v242/song24a.html
PDF: https://proceedings.mlr.press/v242/song24a/song24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-song24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Bing
family: Song
- given: Jean-Jacques
family: Slotine
- given: Quang-Cuong
family: Pham
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1136-1148
id: song24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1136
lastpage: 1148
published: 2024-06-11 00:00:00 +0000
- title: 'Data-driven bifurcation analysis via learning of homeomorphism'
abstract: 'This work proposes a data-driven approach for bifurcation analysis in nonlinear systems when the governing differential equations are not available. Specifically, regularized regression with barrier terms is used to learn a homeomorphism that transforms the underlying system to a reference linear dynamics — either an explicit reference model with desired qualitative behavior, or Koopman eigenfunctions that are identified from some system data under a reference parameter value. When such a homeomorphism fails to be constructed with low error, bifurcation phenomenon is detected. A case study is performed on a planar numerical example where a pitchfork bifurcation exists.'
volume: 242
URL: https://proceedings.mlr.press/v242/tang24b.html
PDF: https://proceedings.mlr.press/v242/tang24b/tang24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-tang24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Wentao
family: Tang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1149-1160
id: tang24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1149
lastpage: 1160
published: 2024-06-11 00:00:00 +0000
- title: 'A learning-based framework to adapt legged robots on-the-fly to unexpected disturbances'
abstract: 'State-of-the-art control methods for legged robots demonstrate impressive performance and robustness on a variety of terrains. Still, these approaches often lack an ability to learn how to adapt to changing conditions online. Such adaptation is especially critical if the robot encounters an environment with dynamics different than those considered in its model or in prior offline training. This paper proposes a learning-based framework that allows a walking robot to stabilize itself under disturbances neglected by its base controller. We consider an approach that simplifies the learning problem into two tasks: learning a model to estimate the robot’s steady-state response and learning a dynamics model for the system near its steady-state behavior. Through experiments with the MIT Mini Cheetah, we show that we can learn these models offline in simulation and transfer them to the real world, optionally finetuning them as the robot collects data. We demonstrate the effectiveness of our approach by applying it to stabilize the quadruped as it carries a box of water on its back.'
volume: 242
URL: https://proceedings.mlr.press/v242/fey24a.html
PDF: https://proceedings.mlr.press/v242/fey24a/fey24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-fey24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Nolan
family: Fey
- given: He
family: Li
- given: Nicholas
family: Adrian
- given: Patrick
family: Wensing
- given: Michael
family: Lemmon
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1161-1173
id: fey24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1161
lastpage: 1173
published: 2024-06-11 00:00:00 +0000
- title: 'On task-relevant loss functions in meta-reinforcement learning'
abstract: 'Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning both the task inference module and the system model. This systematically couples the model discrepancy and the value estimate, thereby enabling our proposed algorithm to learn the policy and task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The proposed method is evaluated in high-dimensional robotic control, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample-efficient manner.'
volume: 242
URL: https://proceedings.mlr.press/v242/shin24a.html
PDF: https://proceedings.mlr.press/v242/shin24a/shin24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-shin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jaeuk
family: Shin
- given: Giho
family: Kim
- given: Howon
family: Lee
- given: Joonho
family: Han
- given: Insoon
family: Yang
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1174-1186
id: shin24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1174
lastpage: 1186
published: 2024-06-11 00:00:00 +0000
- title: 'State-wise safe reinforcement learning with pixel observations'
abstract: 'In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the exploration and learning process, where the agent must avoid unsafe regions without prior knowledge, adds another layer of complexity. In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through a newly introduced latent barrier-like function learning mechanism. As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations. We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return. Experimental evaluations on the safety-gym benchmark suite demonstrate that our proposed method significantly reduces safety violations throughout the training process, and demonstrates faster safety convergence compared to existing methods while achieving competitive results in reward return.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhan24a.html
PDF: https://proceedings.mlr.press/v242/zhan24a/zhan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Sinong
family: Zhan
- given: Yixuan
family: Wang
- given: Qingyuan
family: Wu
- given: Ruochen
family: Jiao
- given: Chao
family: Huang
- given: Qi
family: Zhu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1187-1201
id: zhan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1187
lastpage: 1201
published: 2024-06-11 00:00:00 +0000
- title: 'Multi-agent assignment via state augmented reinforcement learning'
abstract: 'We address the conflicting requirements of a multi-agent assignment problem through constrained reinforcement learning, emphasizing the inadequacy of standard regularization techniques for this purpose. Instead, we recur to a state augmentation approach in which the oscillation of dual variables is exploited by agents to alternate between tasks. In addition, we coordinate the actions of the multiple agents acting on their local states through these multipliers, which are gossiped through a communication network, eliminating the need to access other agent states. By these means, we propose a distributed multi-agent assignment protocol with theoretical feasibility guarantees that we corroborate in a monitoring numerical experiment.'
volume: 242
URL: https://proceedings.mlr.press/v242/agorio24a.html
PDF: https://proceedings.mlr.press/v242/agorio24a/agorio24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-agorio24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Leopoldo
family: Agorio
- given: Sean Van
family: Alen
- given: Miguel
family: Calvo-Fullana
- given: Santiago
family: Paternain
- given: Juan Andrés
family: Bazerque
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1202-1213
id: agorio24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1202
lastpage: 1213
published: 2024-06-11 00:00:00 +0000
- title: 'PlanNetX: Learning an efficient neural network planner from MPC for longitudinal control'
abstract: 'Model predictive control (MPC) is a powerful, optimization-based approach for controlling dynamical systems. However, the computational complexity of online optimization can be problematic on embedded devices. Especially, when we need to guarantee fixed control frequencies. Thus, previous work proposed to reduce the computational burden using imitation learning (IL) approximating the MPC policy by a neural network. In this work, we instead learn the whole planned trajectory of the MPC. We introduce a combination of a novel neural network architecture PlanNetX and a simple loss function based on the state trajectory that leverages the parameterized optimal control structure of the MPC. We validate our approach in the context of autonomous driving by learning a longitudinal planner and benchmarking it extensively in the CommonRoad simulator using synthetic scenarios and scenarios derived from real data. Our experimental results show that we can learn the open-loop MPC trajectory with high accuracy while improving the closed-loop performance of the learned control policy over other baselines like behavior cloning.'
volume: 242
URL: https://proceedings.mlr.press/v242/hoffmann24a.html
PDF: https://proceedings.mlr.press/v242/hoffmann24a/hoffmann24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hoffmann24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jasper
family: Hoffmann
- given: Diego Fernandez
family: Clausen
- given: Julien
family: Brosseit
- given: Julian
family: Bernhard
- given: Klemens
family: Esterle
- given: Moritz
family: Werling
- given: Michael
family: Karg
- given: Joschka Joschka
family: Bödecker
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1214-1227
id: hoffmann24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1214
lastpage: 1227
published: 2024-06-11 00:00:00 +0000
- title: 'Mapping back and forth between model predictive control and neural networks'
abstract: 'Model predictive control (MPC) for linear systems with quadratic costs and linear constraints is shown to admit an exact representation as an implicit neural network. A method to “unravel” the implicit neural network of MPC into an explicit one is also introduced. As well as building links between model-based and data-driven control, these results emphasize the capability of implicit neural networks for representing solutions of optimisation problems, as such problems are themselves implicitly defined functions.'
volume: 242
URL: https://proceedings.mlr.press/v242/drummond24a.html
PDF: https://proceedings.mlr.press/v242/drummond24a/drummond24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-drummond24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Ross
family: Drummond
- given: Pablo
family: Baldivieso
- given: Giorgio
family: Valmorbida
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1228-1240
id: drummond24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1228
lastpage: 1240
published: 2024-06-11 00:00:00 +0000
- title: 'A multi-modal distributed learning algorithm in reproducing kernel Hilbert spaces'
abstract: 'We consider the problem of function estimation by a multi-agent system consisting of two agents and a fusion center. Each agent receives data comprising of samples of an independent variable (input) and the corresponding values of the dependent variable (output). The data remains local and is not shared with other members in the system. The objective of the system is to collaboratively estimate the function from the input to the output. To this end, we present an iterative distributed algorithm for this function estimation problem. Each agent solves a local estimation problem in a Reproducing Kernel Hilbert Space (RKHS) and uploads the function to the fusion center. At the fusion center, the functions are fused by first estimating the data points that would have generated the uploaded functions and then subsequently solving a least squares estimation problem using the estimated data from both functions. The fused function is downloaded by the agents and is subsequently used for estimation at the next iteration along with incoming data. This procedure is executed sequentially and stopped when the difference between consecutively estimated functions becomes small enough. With respect to the algorithm, we prove existence of basis functions for suitable representation of estimated functions and present closed form solutions to the estimation problems at the agents and the fusion center.'
volume: 242
URL: https://proceedings.mlr.press/v242/raghavan24a.html
PDF: https://proceedings.mlr.press/v242/raghavan24a/raghavan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-raghavan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Aneesh
family: Raghavan
- given: Karl Henrik
family: Johansson
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1241-1252
id: raghavan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1241
lastpage: 1252
published: 2024-06-11 00:00:00 +0000
- title: 'Towards model-free LQR control over rate-limited channels'
abstract: 'Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free control design and networked control systems, we ask: Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - in a model-free manner over a rate-limited channel? Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (AQGD), and prove that above a certain finite threshold bit-rate, AQGD guarantees exponentially fast convergence to the globally optimal policy, with no deterioration of the exponent relative to the unquantized setting. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.'
volume: 242
URL: https://proceedings.mlr.press/v242/mitra24a.html
PDF: https://proceedings.mlr.press/v242/mitra24a/mitra24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-mitra24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Aritra
family: Mitra
- given: Lintao
family: Ye
- given: Vijay
family: Gupta
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1253-1265
id: mitra24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1253
lastpage: 1265
published: 2024-06-11 00:00:00 +0000
- title: 'Learning true objectives: Linear algebraic characterizations of identifiability in inverse reinforcement learning'
abstract: 'Inverse reinforcement Learning (IRL) has emerged as a powerful paradigm for extracting expert skills from observed behavior, with applications ranging from autonomous systems to humanrobot interaction. However, the identifiability issue within IRL poses a significant challenge, as multiple reward functions can explain the same observed behavior. This paper provides a linear algebraic characterization of several identifiability notions for an entropy-regularized finite horizon Markov decision process (MDP). Moreover, our approach allows for the seamless integration of prior knowledge, in the form of featurized reward functions, to enhance the identifiability of IRL problems. The results are demonstrated with experiments on a grid world environment.'
volume: 242
URL: https://proceedings.mlr.press/v242/shehab24a.html
PDF: https://proceedings.mlr.press/v242/shehab24a/shehab24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-shehab24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Mohamad Louai
family: Shehab
- given: Antoine
family: Aspeel
- given: Nikos
family: Arechiga
- given: Andrew
family: Best
- given: Necmiye
family: Ozay
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1266-1277
id: shehab24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1266
lastpage: 1277
published: 2024-06-11 00:00:00 +0000
- title: 'Safety filters for black-box dynamical systems by learning discriminating hyperplanes'
abstract: 'Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at each state is what matters. By focusing on this constraint, we can eliminate dependence on any specific certificate function-based design. To achieve this, we define a discriminating hyperplane that shapes the half-space constraint on control input at each state, serving as a sufficient condition for safety. This concept not only generalizes over traditional safety methods but also simplifies safety filter design by eliminating dependence on specific certificate functions. We present two strategies to learn the discriminating hyperplane: (a) a supervised learning approach, using pre-verified control invariant sets for labeling, and (b) a reinforcement learning (RL) approach, which does not require such labels. The main advantage of our method, unlike conventional safe RL approaches, is the separation of performance and safety. This offers a reusable safety filter for learning new tasks, avoiding the need to retrain from scratch. As such, we believe that the new notion of the discriminating hyperplane offers a more generalizable direction towards designing safety filters, encompassing and extending existing certificate-function-based or safe RL methodologies.'
volume: 242
URL: https://proceedings.mlr.press/v242/lavanakul24a.html
PDF: https://proceedings.mlr.press/v242/lavanakul24a/lavanakul24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lavanakul24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Will
family: Lavanakul
- given: Jason
family: Choi
- given: Koushil
family: Sreenath
- given: Claire
family: Tomlin
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1278-1291
id: lavanakul24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1278
lastpage: 1291
published: 2024-06-11 00:00:00 +0000
- title: 'Lagrangian inspired polynomial estimator for black-box learning and control of underactuated systems'
abstract: 'The Lagrangian Inspired Polynomial (LIP) estimator (Giacomuzzo et al., 2023) is a black-box estimator based on Gaussian Process Regression, recently presented for the inverse dynamics identi- fication of Lagrangian systems. It relies on a novel multi-output kernel that embeds the structure of the Euler-Lagrange equation. In this work, we extend its analysis to the class of underactuated robots. First, we show that, despite being a black-box model, the LIP allows estimating kinetic and potential energies, as well as the inertial, Coriolis, and gravity components directly from the overall torque measures. Then we exploit these properties to derive a two-stage energy-based controller for the swing-up and stabilization of balancing robots. Experimental results on a simulated Pendubot confirm the feasibility of the proposed approach.'
volume: 242
URL: https://proceedings.mlr.press/v242/giacomuzzo24a.html
PDF: https://proceedings.mlr.press/v242/giacomuzzo24a/giacomuzzo24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-giacomuzzo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Giulio
family: Giacomuzzo
- given: Riccardo
family: Cescon
- given: Diego
family: Romeres
- given: Ruggero
family: Carli
- given: Alberto Dalla
family: Libera
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1292-1304
id: giacomuzzo24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1292
lastpage: 1304
published: 2024-06-11 00:00:00 +0000
- title: 'From raw data to safety: Reducing conservatism by set expansion'
abstract: 'In response to safety concerns associated with learning-based algorithms, safety filters have been proposed as a modular technique. Generally, these filters heavily rely on the system’s model, which is contradictory if they are intended to enhance a data-driven or end-to-end learning solution. This paper extends our previous work, a purely Data-Driven Safety Filter (DDSF) based on Willems’ lemma, to an extremely short-sighted and non-conservative solution. Specifically, we propose online and offline sample-based methods to expand the safe set of DDSF and reduce its conservatism. Since this method is defined in an input-output framework, it can systematically handle both unknown and time-delay LTI systems using only one single batch of data. To evaluate its performance, we apply the proposed method to a time-delay system under various settings. The simulation results validate the effectiveness of the set expansion algorithm in generating a notably large input-output safe set, resulting in safety filters that are not conservative, even with an extremely short prediction horizon.'
volume: 242
URL: https://proceedings.mlr.press/v242/bajelani24a.html
PDF: https://proceedings.mlr.press/v242/bajelani24a/bajelani24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-bajelani24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Mohammad
family: Bajelani
- given: Klaske Van
family: Heusden
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1305-1317
id: bajelani24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1305
lastpage: 1317
published: 2024-06-11 00:00:00 +0000
- title: 'Dynamics harmonic analysis of robotic systems: Application in data-driven Koopman modelling'
abstract: 'We introduce the use of harmonic analysis to decompose the state space of symmetric robotic systems into orthogonal isotypic subspaces. These are lower-dimensional spaces that capture distinct, symmetric, and synergistic motions. For linear dynamics, we characterize how this decomposition leads to a subdivision of the dynamics into independent linear systems on each subspace, a property we term dynamics harmonic analysis (DHA). To exploit this property, we use Koopman operator theory to propose an equivariant deep-learning architecture that leverages the properties of DHA to learn a global linear model of the system dynamics. Our architecture, validated on synthetic systems and the dynamics of locomotion of a quadrupedal robot, exhibits enhanced generalization, sample efficiency, and interpretability, with less trainable parameters and computational costs.'
volume: 242
URL: https://proceedings.mlr.press/v242/ordonez-apraez24a.html
PDF: https://proceedings.mlr.press/v242/ordonez-apraez24a/ordonez-apraez24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-ordonez-apraez24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Daniel
family: Ordoñez-Apraez
- given: Vladimir
family: Kostic
- given: Giulio
family: Turrisi
- given: Pietro
family: Novelli
- given: Carlos
family: Mastalli
- given: Claudio
family: Semini
- given: Massimilano
family: Pontil
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1318-1329
id: ordonez-apraez24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1318
lastpage: 1329
published: 2024-06-11 00:00:00 +0000
- title: 'Recursively feasible shrinking-horizon MPC in dynamic environments with conformal prediction guarantees'
abstract: 'In this paper, we focus on the problem of shrinking-horizon Model Predictive Control (MPC) in uncertain dynamic environments. We consider controlling a deterministic autonomous system that interacts with uncontrollable stochastic agents during its mission. Employing tools from conformal prediction, existing works derive high-confidence prediction regions for the unknown agent trajectories, and integrate these regions in the design of suitable safety constraints for MPC. Despite guaranteeing probabilistic safety of the closed-loop trajectories, these constraints do not ensure feasibility of the respective MPC schemes for the entire duration of the mission. We propose a shrinking-horizon MPC that guarantees recursive feasibility via a gradual relaxation of the safety constraints as new prediction regions become available online. This relaxation enforces the safety constraints to hold over the least restrictive prediction region from the set of all available prediction regions. In a comparative case study with the state of the art, we empirically show that our approach results in tighter prediction regions and verify recursive feasibility of our MPC scheme.'
volume: 242
URL: https://proceedings.mlr.press/v242/stamouli24a.html
PDF: https://proceedings.mlr.press/v242/stamouli24a/stamouli24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-stamouli24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Charis
family: Stamouli
- given: Lars
family: Lindemann
- given: George
family: Pappas
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1330-1342
id: stamouli24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1330
lastpage: 1342
published: 2024-06-11 00:00:00 +0000
- title: 'Multi-modal conformal prediction regions by optimizing convex shape templates'
abstract: 'Conformal prediction is a statistical tool for producing prediction regions for machine learning models that are valid with high probability. A key component of conformal prediction algorithms is a non-conformity score function that quantifies how different a model’s prediction is from the unknown ground truth value. Essentially, these functions determine the shape and the size of the conformal prediction regions. However, little work has gone into finding non-conformity score functions that produce prediction regions that are multi-modal and practical, i.e., that can efficiently be used in engineering applications. We propose a method that optimizes parameterized shape template functions over calibration data, which results in non-conformity score functions that produce prediction regions with minimum volume. Our approach results in prediction regions that are multi-modal, so they can properly capture residuals of distributions that have multiple modes, and practical, so each region is convex and can be easily incorporated into downstream tasks, such as a motion planner using conformal prediction regions. Our method applies to general supervised learning tasks, while we illustrate its use in time-series prediction. We provide a toolbox and present illustrative case studies of F16 fighter jets and autonomous vehicles, showing an up to 68% reduction in prediction region area.'
volume: 242
URL: https://proceedings.mlr.press/v242/tumu24a.html
PDF: https://proceedings.mlr.press/v242/tumu24a/tumu24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-tumu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Renukanandan
family: Tumu
- given: Matthew
family: Cleaveland
- given: Rahul
family: Mangharam
- given: George
family: Pappas
- given: Lars
family: Lindemann
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1343-1356
id: tumu24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1343
lastpage: 1356
published: 2024-06-11 00:00:00 +0000
- title: 'Learning locally interacting discrete dynamical systems: Towards data-efficient and scalable prediction'
abstract: 'Locally interacting dynamical systems, such as epidemic spread, rumor propagation through crowd, and forest fire, exhibit complex global dynamics originated from local, relatively simple, and often stochastic interactions between dynamic elements. Their temporal evolution is often driven by transitions between a finite number of discrete states. Despite significant advancements in predictive modeling through deep learning, such interactions among many elements have rarely explored as a specific domain for predictive modeling. We present Attentive Recurrent Neural Cellular Automata (AR-NCA), to effectively discover unknown local state transition rules by associating the temporal information between neighboring cells in a permutation-invariant manner. AR-NCA exhibits the superior generalizability across various system configurations (i.e., spatial distribution of states), data efficiency and robustness in extremely data-limited scenarios even in the presence of stochastic interactions, and scalability through spatial dimension-independent prediction.'
volume: 242
URL: https://proceedings.mlr.press/v242/kang24a.html
PDF: https://proceedings.mlr.press/v242/kang24a/kang24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Beomseok
family: Kang
- given: Harshit
family: Kumar
- given: Minah
family: Lee
- given: Biswadeep
family: Chakraborty
- given: Saibal
family: Mukhopadhyay
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1357-1369
id: kang24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1357
lastpage: 1369
published: 2024-06-11 00:00:00 +0000
- title: 'How safe am I given what I see? Calibrated prediction of safety chances for image-controlled autonomy'
abstract: 'End-to-end learning has emerged as a major paradigm for developing autonomous controllers. Unfortunately, with its performance and convenience comes an even greater challenge of safety assurance. A key factor in this challenge is the absence of low-dimensional and interpretable dynamical states, around which traditional assurance methods revolve. Focusing on the online safety prediction problem, this paper systematically investigates a flexible family of learning pipelines based on generative world models, which do not require low-dimensional states. To implement these pipelines, we overcome the challenges of missing safety labels under prediction-induced distribution shift and learning safety-informed latent representations. Moreover, we provide statistical calibration guarantees for our safety chance predictions based on conformal inference. An extensive evaluation of our predictor family on two image-controlled case studies, a racing car and a cartpole, delivers counterintuitive results and highlights open problems in deep safety prediction.'
volume: 242
URL: https://proceedings.mlr.press/v242/mao24c.html
PDF: https://proceedings.mlr.press/v242/mao24c/mao24c.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-mao24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Zhenjiang
family: Mao
- given: Carson
family: Sobolewski
- given: Ivan
family: Ruchkin
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1370-1387
id: mao24c
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1370
lastpage: 1387
published: 2024-06-11 00:00:00 +0000
- title: 'Convex neural network synthesis for robustness in the 1-norm'
abstract: 'With neural networks being used to control safety-critical systems, they increasingly have to be both accurate (in the sense of matching inputs to outputs) and robust. However, these two properties are often at odds with each other and a trade-off has to be navigated. To address this issue, this paper proposes a method to generate an approximation of a neural network which is certifiably more robust. Crucially, the method is fully convex and posed as a semi-definite programme. An application to robustifying model predictive control is used to demonstrate the results. The aim of this work is to introduce a method to navigate the neural network robustness/accuracy trade-off.'
volume: 242
URL: https://proceedings.mlr.press/v242/drummond24b.html
PDF: https://proceedings.mlr.press/v242/drummond24b/drummond24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-drummond24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Ross
family: Drummond
- given: Chris
family: Guiver
- given: Matthew
family: Turner
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1388-1399
id: drummond24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1388
lastpage: 1399
published: 2024-06-11 00:00:00 +0000
- title: 'Increasing information for model predictive control with semi-Markov decision processes'
abstract: 'Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.'
volume: 242
URL: https://proceedings.mlr.press/v242/hosseinkhan-boucher24a.html
PDF: https://proceedings.mlr.press/v242/hosseinkhan-boucher24a/hosseinkhan-boucher24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hosseinkhan-boucher24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Rémy
family: Hosseinkhan Boucher
- given: Stella
family: Douka
- given: Onofrio
family: Semeraro
- given: Lionel
family: Mathelin
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1400-1414
id: hosseinkhan-boucher24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1400
lastpage: 1414
published: 2024-06-11 00:00:00 +0000
- title: 'Physically consistent modeling & identification of nonlinear friction with dissipative Gaussian processes'
abstract: 'Friction modeling has always been a challenging problem due to the complexity of real physical systems. Although a few state-of-the-art structured data-driven methods show their efficiency in nonlinear system modeling, deterministic passivity as one of the significant characteristics of friction is rarely considered in these methods. To address this issue, we propose a Gaussian Process based model that preserves the inherent structural properties such as passivity. A matrix-vector physical structure is considered in our approaches to ensure physical consistency, in particular, enabling a guarantee of positive semi-definiteness of the damping matrix. An aircraft benchmark simulation is employed to demonstrate the efficacy of our methodology. Estimation accuracy and data efficiency are increased substantially by considering and enforcing more structured physical knowledge. Also, the fulfillment of the dissipative nature of the aerodynamics is validated numerically.'
volume: 242
URL: https://proceedings.mlr.press/v242/dai24a.html
PDF: https://proceedings.mlr.press/v242/dai24a/dai24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-dai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Rui
family: Dai
- given: Giulio
family: Evangelisti
- given: Sandra
family: Hirche
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1415-1426
id: dai24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1415
lastpage: 1426
published: 2024-06-11 00:00:00 +0000
- title: 'STEMFold: Stochastic temporal manifold for multi-agent interactions in the presence of hidden agents'
abstract: 'Learning accurate, data-driven predictive models for multiple interacting agents following unknown dynamics is crucial in many real-world physical and social systems. In many scenarios, dynamics prediction must be performed under incomplete observations, i.e., only a subset of agents are known and observable from a larger topological system while the behaviors of the unobserved agents and their interactions with the observed agents are not known. When only incomplete observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to learn a closed-form model in these variables using either analytic or data-driven techniques. In this work, we propose STEMFold, a spatiotemporal attention-based generative model, to learn a stochastic manifold to predict the underlying unmeasured dynamics of the multi-agent system from observations of only visible agents. Our analytical results motivate STEMFold design using a spatiotemporal graph with time anchors to effectively map the observations of visible agents to a stochastic manifold with no prior information about interaction graph topology. We empirically evaluated our method on two simulations and two real-world datasets, where it outperformed existing networks in predicting complex multiagent interactions, even with many unobserved agents.'
volume: 242
URL: https://proceedings.mlr.press/v242/kumawat24a.html
PDF: https://proceedings.mlr.press/v242/kumawat24a/kumawat24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kumawat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Hemant
family: Kumawat
- given: Biswadeep
family: Chakraborty
- given: Saibal
family: Mukhopadhyay
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1427-1439
id: kumawat24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1427
lastpage: 1439
published: 2024-06-11 00:00:00 +0000
- title: 'Distributed on-the-fly control of multi-agent systems with unknown dynamics: Using limited data to obtain near-optimal control'
abstract: 'We propose a method called ODMU for “on-the-fly control of distributed multi-agent systems with unknown nonlinear dynamics” and with (a)synchronous communication between the agents where data from a single finite-horizon trajectory is used, possibly in conjunction with side information. ODMU can be applied to real-time scenarios when the dynamics of the system are unknown or suddenly change such that a priori known model cannot be applied. In our proposed algorithm, the agents communicate their states using (a)synchronous communication and exploit the side information, e.g., regularities of the system, states, agents’ communication scheme, algebraic limitations, and coupling in the system states. We provide ODMU for over-approximating the reachable sets and to control the agents under conditions with severely limited data. ODMU creates differential inclusion sets that calculate the over approximations of the reachable sets containing the unknown vector field. We show that ODMU calculates the near-optimal control and calculates an upper bound (suboptimality bound) for the error between the optimal trajectory and the trajectory calculated by ODMU. We use convex-optimization-based control to obtain the guaranteed near-optimal solution. We demonstrate the effect of side information on obtaining smaller bounds on suboptimality by applying ODMU on a system of unicycles. Additionally, we present a case study where a multi-agent system of unicycles with unknown dynamics is controlled via ODMU. Moreover, we have developed two baselines, SINDYcMulti and CGP-LCBMulti to compare our method with them.'
volume: 242
URL: https://proceedings.mlr.press/v242/meshkat-alsadat24a.html
PDF: https://proceedings.mlr.press/v242/meshkat-alsadat24a/meshkat-alsadat24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-meshkat-alsadat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Shayan
family: Meshkat Alsadat
- given: Nasim
family: Baharisangari
- given: Zhe
family: Xu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1440-1451
id: meshkat-alsadat24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1440
lastpage: 1451
published: 2024-06-11 00:00:00 +0000
- title: 'CACTO-SL: Using Sobolev learning to improve continuous actor-critic with trajectory optimization'
abstract: 'Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.'
volume: 242
URL: https://proceedings.mlr.press/v242/alboni24a.html
PDF: https://proceedings.mlr.press/v242/alboni24a/alboni24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-alboni24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Elisa
family: Alboni
- given: Gianluigi
family: Grandesso
- given: Gastone Pietro
family: Rosati Papini
- given: Justin
family: Carpentier
- given: Andrea
family: Del Prete
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1452-1463
id: alboni24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1452
lastpage: 1463
published: 2024-06-11 00:00:00 +0000
- title: 'Multi-agent coverage control with transient behavior consideration'
abstract: 'This paper studies the multi-agent coverage control (MAC) problem where agents must dynamically learn an unknown density function while performing coverage tasks. Unlike many current theoretical frameworks that concentrate solely on the regret occurring at specific targeted sensory locations, our approach additionally considers the regret caused by transient behavior – the path from one location and another. We propose the multi-agent coverage control with the doubling trick (MAC-DT) algorithm and demonstrate that it achieves (approximated) regret of $\widetilde O(\sqrt{T})$ even when accounting for the transient behavior. Our result is also supported by numerical experiments, showcasing that the proposed algorithm manages to match or even outperform the baseline algorithms in simulation environments. We also show how our algorithm can be modified to handle safety constraints and further implement the algorithm on a real-robotic testbed.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24e.html
PDF: https://proceedings.mlr.press/v242/zhang24e/zhang24e.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Runyu
family: Zhang
- given: Haitong
family: Ma
- given: Na
family: Li
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1464-1476
id: zhang24e
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1464
lastpage: 1476
published: 2024-06-11 00:00:00 +0000
- title: 'Data driven verification of positive invariant sets for discrete, nonlinear systems'
abstract: 'Invariant sets are essential for understanding the stability and safety of nonlinear systems. However, certifying the existence of a positive invariant set for a nonlinear model is difficult and often requires knowledge of the system’s dynamic model. This paper presents a data driven method to certify a positive invariant set for an unknown, discrete, nonlinear system. A triangulation of a subset of the state space is used to query data points. Then, linear programming is used to create a continuous piecewise affine function that fulfills the criteria of the Extended Invariant Set Principle by leveraging an inequality error bound that uses the Lipschitz constant of the unknown system. Numerical results demonstrate the program’s ability to certify positive invariant sets from sampled data.'
volume: 242
URL: https://proceedings.mlr.press/v242/strong24a.html
PDF: https://proceedings.mlr.press/v242/strong24a/strong24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-strong24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Amy K.
family: Strong
- given: Leila J.
family: Bridgeman
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1477-1488
id: strong24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1477
lastpage: 1488
published: 2024-06-11 00:00:00 +0000
- title: 'Adaptive teaching in heterogeneous agents: Balancing surprise in sparse reward scenarios'
abstract: 'Learning from Demonstration (LfD) can be an efficient way to train systems with analogous agents by enabling “Student” agents to learn from the demonstrations of the most experienced “Teacher” agent, instead of training their policy in parallel. However, when there are discrepancies in agent capabilities, such as divergent actuator power or joint angle constraints, naively replicating demonstrations that are out of bounds for the Student’s capability can limit efficient learning. We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents. Our framework is based on the concept of “surprise”, inspired by its application in exploration incentivization in sparse-reward environments. Surprise is repurposed to enable the Teacher to detect and adapt to differences between itself and the Student. By focusing on maximizing its surprise in response to the environment while concurrently minimizing the Student’s surprise in response to the demonstrations, the Teacher agent can effectively tailor its demonstrations to the Student’s specific capabilities and constraints. We validate our method by demonstrating improvements in the Student’s learning in control tasks within sparse-reward environments.'
volume: 242
URL: https://proceedings.mlr.press/v242/clark24a.html
PDF: https://proceedings.mlr.press/v242/clark24a/clark24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-clark24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Emma
family: Clark
- given: Kanghyun
family: Ryu
- given: Negar
family: Mehr
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1489-1501
id: clark24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1489
lastpage: 1501
published: 2024-06-11 00:00:00 +0000
- title: 'Can a transformer represent a Kalman filter?'
abstract: 'Transformers are a class of autoregressive deep learning architectures which have recently achieved state-of-the-art performance in various vision, language, and robotics tasks. We revisit the problem of Kalman Filtering in linear dynamical systems and show that Transformers can approximate the Kalman Filter in a strong sense. Specifically, for any observable LTI system we construct an explicit causally-masked Transformer which implements the Kalman Filter, up to a small additive error which is bounded uniformly in time; we call our construction the Transformer Filter. Our construction is based on a two-step reduction. We first show that a softmax self-attention block can exactly represent a Nadaraya–Watson kernel smoothing estimator with a Gaussian kernel. We then show that this estimator closely approximates the Kalman Filter. We also investigate how the Transformer Filter can be used for measurement-feedback control and prove that the resulting nonlinear controllers closely approximate the performance of standard optimal control policies such as the LQG controller.'
volume: 242
URL: https://proceedings.mlr.press/v242/goel24a.html
PDF: https://proceedings.mlr.press/v242/goel24a/goel24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-goel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Gautam
family: Goel
- given: Peter
family: Bartlett
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1502-1512
id: goel24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1502
lastpage: 1512
published: 2024-06-11 00:00:00 +0000
- title: 'Data-driven simulator for mechanical circulatory support with domain adversarial neural process'
abstract: 'We propose a data-driven simulator for Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process architecture, allowing it to capture the probabilistic relationship between MCS pump levels and aortic pressure measurements with uncertainty. We use domain adversarial training to combine real-world and simulation data, resulting in a more realistic and diverse representation of potential outcomes. Empirical results with an improvement of 19% in non-stationary trend prediction establish DANP as an effective tool for clinicians to understand and make informed decisions regarding MCS patient treatment.'
volume: 242
URL: https://proceedings.mlr.press/v242/sun24a.html
PDF: https://proceedings.mlr.press/v242/sun24a/sun24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-sun24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Sophia
family: Sun
- given: Wenyuan
family: Chen
- given: Zihao
family: Zhou
- given: Sonia
family: Fereidooni
- given: Elise
family: Jortberg
- given: Rose
family: Yu
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1513-1525
id: sun24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1513
lastpage: 1525
published: 2024-06-11 00:00:00 +0000
- title: 'DC4L: Distribution shift recovery via data-driven control for deep learning models'
abstract: 'Deep neural networks have repeatedly been shown to be non-robust to the uncertainties of the real world, even to naturally occurring ones. A vast majority of current approaches have focused on data-augmentation methods to expand the range of perturbations that the classifier is exposed to while training. A relatively unexplored avenue that is equally promising involves sanitizing an image as a preprocessing step, depending on the nature of perturbation. In this paper, we propose to use control for learned models to recover from distribution shifts online. Specifically, our method applies a sequence of semantic-preserving transformations to bring the shifted data closer in distribution to the training set, as measured by the Wasserstein distance. Our approach is to 1) formulate the problem of distribution shift recovery as a Markov decision process, which we solve using reinforcement learning, 2) identify a minimum condition on the data for our method to be applied, which we check online using a binary classifier, and 3) employ dimensionality reduction through orthonormal projection to aid in our estimates of the Wasserstein distance. We provide theoretical evidence that orthonormal projection preserves characteristics of the data at the distributional level. We apply our distribution shift recovery approach to the ImageNet-C benchmark for distribution shifts, demonstrating an improvement in average accuracy of up to 14.21% across a variety of state-of-the-art ImageNet classifiers. We further show that our method generalizes to composites of shifts from the ImageNet-C benchmark, achieving improvements in average accuracy of up to 9.81%. Finally, we test our method on CIFAR-100-C and report improvements of up to 8.25%.'
volume: 242
URL: https://proceedings.mlr.press/v242/lin24b.html
PDF: https://proceedings.mlr.press/v242/lin24b/lin24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lin24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Vivian
family: Lin
- given: Kuk Jin
family: Jang
- given: Souradeep
family: Dutta
- given: Michele
family: Caprio
- given: Oleg
family: Sokolsky
- given: Insup
family: Lee
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1526-1538
id: lin24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1526
lastpage: 1538
published: 2024-06-11 00:00:00 +0000
- title: 'QCQP-Net: Reliably learning feasible alternating current optimal power flow solutions under constraints'
abstract: 'At the heart of power system operations, alternating current optimal power flow (ACOPF) studies the generation of electric power in the most economical way under network-wide load requirement, and can be formulated as a highly structured non-convex quadratically constrained quadratic program (QCQP). Optimization-based solutions to ACOPF (such as ADMM or interior-point method), as the classic approach, require large amount of computation and cannot meet the need to repeatedly solve the problem as load requirement frequently changes. On the other hand, learning-based methods that directly predict the ACOPF solution given the load input incur little computational cost but often generates infeasible solutions (i.e. violate the constraints of ACOPF). In this work, we combine the best of both worlds — we propose an innovated framework for learning ACOPF, where the input load is mapped to the ACOPF solution through a neural network in a computationally efficient and reliable manner. Key to our innovation is a specific-purpose “activation function” defined implicitly by a QCQP and a novel loss, which enforce constraint satisfaction. We show through numerical simulations that our proposed method achieves superior feasibility rate and generation cost in situations where the existing learning-based approaches fail.'
volume: 242
URL: https://proceedings.mlr.press/v242/zeng24a.html
PDF: https://proceedings.mlr.press/v242/zeng24a/zeng24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zeng24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Sihan
family: Zeng
- given: Youngdae
family: Kim
- given: Yuxuan
family: Ren
- given: Kibaek
family: Kim
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1539-1551
id: zeng24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1539
lastpage: 1551
published: 2024-06-11 00:00:00 +0000
- title: 'A deep learning approach for distributed aggregative optimization with users’ Feedback'
abstract: 'We propose a novel distributed data-driven scheme for online aggregative optimization, i.e., the framework in which agents in a network aim to cooperatively minimize the sum of local timevarying costs each depending on a local decision variable and an aggregation of all of them. We consider a ”personalized” setup in which each cost exhibits a term capturing the user’s dissatisfaction and, thus, is unknown. We enhance an existing distributed optimization scheme by endowing it with a learning mechanism based on neural networks that estimate the missing part of the gradient via users’ feedback about the cost. Our algorithm combines two loops with different timescales devoted to performing optimization and learning steps. In turn, the proposed scheme also embeds a distributed consensus mechanism aimed at locally reconstructing the unavailable global information due to the presence of the aggregative variable. We prove an upper bound for the dynamic regret related to (i) the initial conditions, (ii) the temporal variations of the functions, and (iii) the learning errors about the unknown cost. Finally, we test our method via numerical simulations.'
volume: 242
URL: https://proceedings.mlr.press/v242/brumali24a.html
PDF: https://proceedings.mlr.press/v242/brumali24a/brumali24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-brumali24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Riccardo
family: Brumali
- given: Guido
family: Carnevale
- given: Giuseppe
family: Notarstefano
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1552-1564
id: brumali24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1552
lastpage: 1564
published: 2024-06-11 00:00:00 +0000
- title: 'A framework for evaluating human driver models using neuroimaging'
abstract: 'Driving is a complex task which requires synthesizing multiple senses, safely reasoning about the behavior of others, and adapting to a constantly changing environment. Failures of human driving models can become failures of vehicle safety features or autonomous driving systems that rely on their predictions. Although there has been a variety of work to model human drivers, it can be challenging to determine to what extent they truly resemble the humans they attempt to mimic. The development of improved human driver models can serve as a step towards better vehicle safety. In order to better compare and develop driver models, we propose going beyond driving behavior to examine how well these models reflect the cognitive activity of human drivers. In particular, we compare features extracted from human driver models with brain activity as measured by functional magnetic resonance imaging. We demonstrate this approach on three human driver models with brain activity data from two human subjects. We find that model predictive control is a better fit for driver brain activity than classic non-predictive models, which is in good agreement with previous works that obtain better predictions of human driving behavior using model predictive control.'
volume: 242
URL: https://proceedings.mlr.press/v242/strong24b.html
PDF: https://proceedings.mlr.press/v242/strong24b/strong24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-strong24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Christopher
family: Strong
- given: Kaylene
family: Stocking
- given: Jingqi
family: Li
- given: Tianjiao
family: Zhang
- given: Jack
family: Gallant
- given: Claire
family: Tomlin
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1565-1578
id: strong24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1565
lastpage: 1578
published: 2024-06-11 00:00:00 +0000
- title: 'Deep Hankel matrices with random elements'
abstract: 'Willems’ fundamental lemma enables a trajectory-based characterization of linear systems through data-based Hankel matrices. However, in the presence of measurement noise, we ask: Is this noisy Hankel-based model expressive enough to re-identify itself? In other words, we study the output prediction accuracy from recursively applying the same persistently exciting input sequence to the model. We find an asymptotic connection to this self-consistency question in terms of the amount of data. More importantly, we also connect this question to the depth (number of rows) of the Hankel model, showing the simple act of reconfiguring a finite dataset significantly improves accuracy. We apply these insights to find a parsimonious depth for LQR problems over the trajectory space.'
volume: 242
URL: https://proceedings.mlr.press/v242/lawrence24a.html
PDF: https://proceedings.mlr.press/v242/lawrence24a/lawrence24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-lawrence24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Nathan
family: Lawrence
- given: Philip
family: Loewen
- given: Shuyuan
family: Wang
- given: Michael
family: Forbes
- given: Bhushan
family: Gopaluni
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1579-1591
id: lawrence24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1579
lastpage: 1591
published: 2024-06-11 00:00:00 +0000
- title: 'Robust exploration with adversary via Langevin Monte Carlo'
abstract: 'In the realm of Deep Q-Networks (DQNs), numerous exploration strategies have demonstrated efficacy within controlled environments. However, these methods encounter formidable challenges when confronted with the unpredictability of real-world scenarios marked by disturbances. The optimization of exploration efficiency under such disturbances is not fully investigated. In response to these challenges, this work introduces a versatile reinforcement learning (RL) framework that systematically addresses the intricate interplay between exploration and robustness in dynamic and unpredictable environments. We propose a robust RL methodology, framed within a two-player max-min adversarial paradigm. This formulation is cast as a Probabilistic Action Robust Markov Decision Process (MDP), grounded in a cyber-physical perspective. Our methodology capitalizes on Langevin Monte Carlo (LMC) for Q-function exploration, facilitating iterative updates that empower both the protagonist and adversary to efficaciously explore. Notably, we extend this adversarial training paradigm to encompass robustness against delayed feedback episodes. Empirical evaluation, conducted on benchmark problems such as N-Chain and deep brain stimulation, underlines the consistent superiority of our method over baseline approaches across diverse perturbation scenarios and instances of delayed feedback.'
volume: 242
URL: https://proceedings.mlr.press/v242/hsu24a.html
PDF: https://proceedings.mlr.press/v242/hsu24a/hsu24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-hsu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Hao-Lun
family: Hsu
- given: Miroslav
family: Pajic
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1592-1605
id: hsu24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1592
lastpage: 1605
published: 2024-06-11 00:00:00 +0000
- title: 'Generalized constraint for probabilistic safe reinforcement learning'
abstract: 'In this paper, we consider the problem of learning policies for probabilistic safe reinforcement learning (PSRL). Specifically, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. While the explicit gradient of the probabilistic constraint for solving PSRL directly exists, the high variance in the estimate of the gradient hinders its performance in problems with long-horizons. An alternative that is frequently explored in the literature is to consider a cumulative safe reinforcement learning (CSRL) setting. In this setting, the estimates of the constraint’s gradient have less variance but are biased (worse solutions than the PSRL) and they provide an approximate solution since they solve a relaxation of the PSRL formulation. In this work, we propose a safe reinforcement learning framework with a generalized constraint for solving the PSRL problems, which we term Generalized Safe Reinforcement Learning (GSRL). Our theoretical contributions substantiate that the proposed GSRL can recover both the PSRL and CSRL settings. In addition, it can be naturally combined with any state-of-the-art safe RL algorithms like PPO-Lagrangian, TD3-Lagrangian, CPO, PCPO, etc. We evaluate the GSRL by a series of empirical experiments in the well-known safe RL benchmark Bullet-Safety-Gym, which exhibit a better return-safety trade-off than both the PSRL and CSRL formulations.'
volume: 242
URL: https://proceedings.mlr.press/v242/chen24b.html
PDF: https://proceedings.mlr.press/v242/chen24b/chen24b.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-chen24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Weiqin
family: Chen
- given: Santiago
family: Paternain
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1606-1618
id: chen24b
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1606
lastpage: 1618
published: 2024-06-11 00:00:00 +0000
- title: 'Neural processes with event triggers for fast adaptation to changes'
abstract: 'Traditionally, first-principle models are used to monitor and control dynamical systems. However, modeling complex systems using first principles can be challenging. Learning the dynamics from data using neural networks has emerged as a viable alternative. In practice, some parameters of a system may vary across different system instances, but training separate neural networks for all possible parameter combinations can be infeasible. Therefore, meta-learning using, e.g., conditional neural processes (CNPs), aims to learn a prior model over the system dynamics for various parameters. These models can then adapt on deployment to the parameters of a system instance using a context set composed of past observations. However, changes in parameters can also occur online during operation and naively adding past observations across parameter variations to the context set can distort the model’s latent representation, leading to inaccurate predictions over time. This paper introduces an adaptation scheme to enable CNPs to cope with such online variations. We combine a sliding window to accommodate gradual variations with the use of event triggers to detect sudden changes. The event triggers are based on concentration inequalities, they reset the context set of the CNP once observations deviate significantly from the CNP’s predictions. We validate our concepts on two nonlinear dynamical systems under parameter variations and demonstrate that our approaches decrease the prediction error over time as well as their efficacy for control.'
volume: 242
URL: https://proceedings.mlr.press/v242/brunzema24a.html
PDF: https://proceedings.mlr.press/v242/brunzema24a/brunzema24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-brunzema24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Paul
family: Brunzema
- given: Paul
family: Kruse
- given: Sebastian
family: Trimpe
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1619-1632
id: brunzema24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1619
lastpage: 1632
published: 2024-06-11 00:00:00 +0000
- title: 'Data-driven strategy synthesis for stochastic systems with unknown nonlinear disturbances'
abstract: 'In this paper, we introduce a data-driven framework for synthesis of provably-correct controllers for general nonlinear switched systems under complex specifications. The focus is on systems with unknown disturbances whose effects on the dynamics of the system is nonlinear. The specification is assumed to be given as linear temporal logic over finite traces (LTLf) formulas. Starting from observations of either the disturbance or the state of the system, we first learn an ambiguity set that contains the unknown distribution of the disturbances with a user-defined confidence. Next, we obtain a robust Markov decision process (RMDP) as a finite abstraction of the system. By composing the RMDP with the automaton obtained from the LTLf formula and performing optimal robust value iteration on the composed RMDP, we synthesize a strategy that yields a high probability that the uncertain system satisfies the specifications. Our empirical evaluations on systems with a wide variety of disturbances show that the strategies synthesized with our approach lead to high satisfaction probabilities and validate the theoretical guarantees.'
volume: 242
URL: https://proceedings.mlr.press/v242/gracia24a.html
PDF: https://proceedings.mlr.press/v242/gracia24a/gracia24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gracia24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Ibon
family: Gracia
- given: Dimitris
family: Boskos
- given: Luca
family: Laurenti
- given: Morteza
family: Lahijanian
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1633-1645
id: gracia24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1633
lastpage: 1645
published: 2024-06-11 00:00:00 +0000
- title: 'Growing Q-networks: Solving continuous control tasks with adaptive control resolution'
abstract: 'Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favorable exploration characteristics, while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and improve energy efficiency, while regularization via action costs can be detrimental to exploration. Our work aims to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. We take advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that enable surprisingly strong performance on continuous control tasks.'
volume: 242
URL: https://proceedings.mlr.press/v242/seyde24a.html
PDF: https://proceedings.mlr.press/v242/seyde24a/seyde24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-seyde24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Tim
family: Seyde
- given: Peter
family: Werner
- given: Wilko
family: Schwarting
- given: Markus
family: Wulfmeier
- given: Daniela
family: Rus
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1646-1661
id: seyde24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1646
lastpage: 1661
published: 2024-06-11 00:00:00 +0000
- title: 'Hamiltonian GAN'
abstract: 'A growing body of work leverages the Hamiltonian formalism as an inductive bias for physically plausible neural network based video generation. The structure of the Hamiltonian ensures conservation of a learned quantity (e.g., energy) and imposes a phase-space interpretation on the low-dimensional manifold underlying the input video. While this interpretation has the potential to facilitate the integration of learned representations in downstream tasks, existing methods are limited in their applicability as they require a structural prior for the configuration space at design time. In this work, we present a GAN-based video generation pipeline with a learned configuration space map and Hamiltonian neural network motion model, to learn a representation of the configuration space from data. We train our model with a physics-inspired cyclic-coordinate loss function which encourages a minimal representation of the configuration space and improves interpretability. We demonstrate the efficacy and advantages of our approach on the Hamiltonian Dynamics Suite Toy Physics dataset.'
volume: 242
URL: https://proceedings.mlr.press/v242/allen-blanchette24a.html
PDF: https://proceedings.mlr.press/v242/allen-blanchette24a/allen-blanchette24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-allen-blanchette24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Christine
family: Allen-Blanchette
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1662-1674
id: allen-blanchette24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1662
lastpage: 1674
published: 2024-06-11 00:00:00 +0000
- title: 'Do no harm: A counterfactual approach to safe reinforcement learning'
abstract: 'Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn feedback policies that can take into account complex representations of the environment and uncertainty. When considering safety constraints, constrained optimization approaches where agents are penalized for constraint violations are commonly used. In such methods, if agents are initialized in or must visit states where constraint violation might be inevitable, it is unclear if or how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to an alternate, safe policy. In a philosophical sense this method only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem when constraint violation is inevitable. We present simulation studies on a rover with uncertain road friction and a tractor-trailer parking environment that demonstrate our constraint formulation enables agents to learn safer policies than traditional constrained RL methods.'
volume: 242
URL: https://proceedings.mlr.press/v242/vaskov24a.html
PDF: https://proceedings.mlr.press/v242/vaskov24a/vaskov24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-vaskov24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Sean
family: Vaskov
- given: Wilko
family: Schwarting
- given: Chris
family: Baker
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1675-1687
id: vaskov24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1675
lastpage: 1687
published: 2024-06-11 00:00:00 +0000
- title: 'Wasserstein distributionally robust regret-optimal control over infinite-horizon'
abstract: 'We investigate the Distributionally Robust Regret-Optimal (DR-RO) control of discrete-time linear dynamical systems with quadratic cost over an infinite horizon. Regret is the difference in cost obtained by a causal controller and a clairvoyant controller with access to future disturbances. We focus on the infinite-horizon framework, which results in stability guarantees. In this DR setting, the probability distribution of the disturbances resides within a Wasserstein-2 ambiguity set centered at a specified nominal distribution. Our objective is to identify a control policy that minimizes the worst-case expected regret over an infinite horizon, considering all potential disturbance distributions within the ambiguity set. In contrast to prior works, which assume time-independent disturbances, we relax this constraint to allow for time-correlated disturbances, thus actual distributional robustness. While we show that the resulting optimal controller is non-rational and lacks a finite-dimensional state-space realization, we demonstrate that it can still be uniquely characterized by a finite dimensional parameter. Exploiting this fact, we introduce an efficient numerical method to compute the controller in the frequency domain using fixed-point iterations. This method circumvents the computational bottleneck associated with the finite-horizon problem, where the semi-definite programming (SDP) solution dimension scales with the time horizon. Numerical experiments demonstrate the effectiveness and performance of our framework.'
volume: 242
URL: https://proceedings.mlr.press/v242/kargin24a.html
PDF: https://proceedings.mlr.press/v242/kargin24a/kargin24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kargin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Taylan
family: Kargin
- given: Joudi
family: Hajar
- given: Vikrant
family: Malik
- given: Babak
family: Hassibi
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1688-1701
id: kargin24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1688
lastpage: 1701
published: 2024-06-11 00:00:00 +0000
- title: 'Probably approximately correct stability of allocations in uncertain coalitional games with private sampling'
abstract: 'We study coalitional games with exogenous uncertainty in the coalition value, in which each agent is allowed to have private samples of the uncertainty. As a consequence, the agents may have a different perception of stability of the grand coalition. In this context, we propose a novel methodology to study the out-of-sample coalitional rationality of allocations in the set of stable allocations (i.e., the core). Our analysis builds on the framework of probably approximately correct learning. Initially, we state a priori and a posteriori guarantees for the entire core. Furthermore, we provide a distributed algorithm to compute a compression set that determines the generalization properties of the a posteriori statements. We then refine our probabilistic robustness bounds by specialising the analysis to a single payoff allocation, taking, also in this case, both a priori and a posteriori approaches. Finally, we consider a relaxed zeta-core to include nearby allocations and also address the case of empty core. For this case, probabilistic statements are given on the eventual stability of allocations in the zeta-core.'
volume: 242
URL: https://proceedings.mlr.press/v242/pantazis24a.html
PDF: https://proceedings.mlr.press/v242/pantazis24a/pantazis24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-pantazis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: George
family: Pantazis
- given: Filiberto
family: Fele
- given: Filippo
family: Fabiani
- given: Sergio
family: Grammatico
- given: Kostas
family: Margellos
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1702-1714
id: pantazis24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1702
lastpage: 1714
published: 2024-06-11 00:00:00 +0000
- title: 'Reinforcement learning-driven parametric curve fitting for snake robot gait design'
abstract: 'Snake-inspired robots demonstrate exceptional versatility through challenging terrains such as sand, rubble, and ice. However, their high-dimensional continuous action spaces make analytical gait design challenging. Early works by Hirose (1994) showed that gait parameterization over low-dimensional spatially and temporally varying sine waves can serve as basis functions for the shape-space or central pattern generators (CPGs). Recent approaches to designing CPGs have combined annealed chain-fitting, which solves for joint angles that fit a snake robot to a desired backbone curve, and keyframe extraction, which then fits analytic shape functions to the resulting optimized joint angles. However, the non-convex optimization associated with these methods is fraught with local optima exacerbated by constraints such as actuator limits. Reinforcement Learning has emerged as a promising alternative for searching over such spaces. However, end-to-end RL approaches trained purely in simulation are vulnerable to reality distribution shifts, lack safety guarantees, and don’t yield an intuitive representation of the learned gait. We propose a method that translates a gait found via policy search into a parametric representation of its component sinusoidal equations thus leveraging the strengths of both learning-based and classical approaches. Simulation and hardware experiments show that the proposed pipeline can generate parametric gaits where classical curve fitting-based approaches fail.'
volume: 242
URL: https://proceedings.mlr.press/v242/naish24a.html
PDF: https://proceedings.mlr.press/v242/naish24a/naish24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-naish24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jack
family: Naish
- given: Jacob
family: Rodriguez
- given: Jenny
family: Zhang
- given: Bryson
family: Jones
- given: Guglielmo
family: Daddi
- given: Andrew
family: Orekhov
- given: Rob
family: Royce
- given: Michael
family: Paton
- given: Howie
family: Choset
- given: Masahiro
family: Ono
- given: Rohan
family: Thakker
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1715-1727
id: naish24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1715
lastpage: 1727
published: 2024-06-11 00:00:00 +0000
- title: 'Pontryagin neural operator for solving general-sum differential games with parametric state constraints'
abstract: 'The values of two-player general-sum differential games are viscosity solutions to Hamilton-Jacobi-Isaacs (HJI) equations. Value and policy approximations for such games suffer from the curse of dimensionality (CoD). Alleviating CoD through physics-informed neural networks (PINN) encounters convergence issues when value discontinuity is present due to state constraints. On top of these challenges, it is often necessary to learn generalizable values and policies across a parametric space of games, e.g., for game parameter inference when information is incomplete. To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms existing state-of-the-art (SOTA) on safety performance across games with parametric state constraints. Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts, which are computationally cheap. We show that the discontinuity of costate dynamics (in the presence of state constraints) effectively enables the learning of discontinuous values, without requiring manually supervised data as suggested by the current SOTA. More importantly, we show that the close relationship between costates and policies makes the former critical in learning feedback control policies with generalizable safety performance.'
volume: 242
URL: https://proceedings.mlr.press/v242/zhang24f.html
PDF: https://proceedings.mlr.press/v242/zhang24f/zhang24f.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-zhang24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Lei
family: Zhang
- given: Mukesh
family: Ghimire
- given: Zhe
family: Xu
- given: Wenlong
family: Zhang
- given: Yi
family: Ren
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1728-1740
id: zhang24f
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1728
lastpage: 1740
published: 2024-06-11 00:00:00 +0000
- title: 'Adaptive neural network based control approach for building energy control under changing environmental conditions'
abstract: 'Deep neural networks are adept at modeling complex relationships between input and output variables. When trained on diverse datasets, they can understand not just the specifics of individual objects but also the broader principles governing an entire object class. This research applies this principle to building heating control, a domain marked by significant heterogeneity and constant environmental changes, including renovations and changes in user behavior. Our approach involves training the network on a wide range of data instances, enhancing its adaptability to newly distributed data representing unseen scenarios. We find that Transformer-based LSTM architectures are particularly adept for this task as they are able to remember previous tasks’ learning. We propose a simple yet effective control algorithm that separates system identification and forecasting from the optimization-based control step. This separation simplifies the control process while ensuring robust performance. In a wide range of simulation experiments, we demonstrate that our “universally trained” neural network control can adjust to changing conditions, thus reducing the need for more complex continual learning techniques. Our results suggest that training neural networks on varied datasets empowers the network with the ability to generalize and adapt beyond specific training instances, which demonstrates their effectiveness in dynamic and heterogeneous environments.'
volume: 242
URL: https://proceedings.mlr.press/v242/frison24a.html
PDF: https://proceedings.mlr.press/v242/frison24a/frison24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-frison24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Lilli
family: Frison
- given: Simon
family: Gölzhäuser
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1741-1752
id: frison24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1741
lastpage: 1752
published: 2024-06-11 00:00:00 +0000
- title: 'Physics-constrained learning of PDE systems with uncertainty quantified port-Hamiltonian models'
abstract: 'Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic physical principles, which consequently makes them untrustworthy and limits generalization. To address this problem, we propose a physics-constrained learning method that combines powerful learning tools and reliable physical models. Our method leverages the data collected from observations by sending them into a Gaussian process that is physically constrained by a distributed Port-Hamiltonian model. Based on the Bayesian nature of the Gaussian process, we not only learn the dynamics of the system, but also enable uncertainty quantification. Furthermore, the proposed approach preserves the compositional nature of Port-Hamiltonian systems.'
volume: 242
URL: https://proceedings.mlr.press/v242/tan24a.html
PDF: https://proceedings.mlr.press/v242/tan24a/tan24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-tan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Kaiyuan
family: Tan
- given: Peilun
family: Li
- given: Thomas
family: Beckers
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1753-1764
id: tan24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1753
lastpage: 1764
published: 2024-06-11 00:00:00 +0000
- title: 'Proto-MPC: An encoder-prototype-decoder approach for quadrotor control in challenging winds'
abstract: 'Quadrotors are increasingly used in the evolving field of aerial robotics for their agility and mechanical simplicity. However, inherent uncertainties, such as aerodynamic effects coupled with quadrotors’ operation in dynamically changing environments, pose significant challenges for traditional, nominal model-based control designs. To address these challenges, we propose a multi-task meta-learning method called Encoder-Prototype-Decoder (EPD), which has the advantage of effectively balancing shared and distinctive representations across diverse training tasks. Subsequently, we integrate the EPD model into a model predictive control problem (Proto-MPC) to enhance the quadrotor’s ability to adapt and operate across a spectrum of dynamically changing tasks with an efficient online implementation. We validate the proposed method in simulations, which demonstrates Proto-MPC’s robust performance in trajectory tracking of a quadrotor being subject to static and spatially varying side winds.'
volume: 242
URL: https://proceedings.mlr.press/v242/gu24a.html
PDF: https://proceedings.mlr.press/v242/gu24a/gu24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Yuliang
family: Gu
- given: Sheng
family: Cheng
- given: Naira
family: Hovakimyan
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1765-1776
id: gu24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1765
lastpage: 1776
published: 2024-06-11 00:00:00 +0000
- title: 'Efficient imitation learning with conservative world models'
abstract: 'We tackle the problem of policy learning from expert demonstrations without a reward function. A central challenge in this space is that these policies fail upon deployment due to issues of distributional shift, environment stochasticity, or compounding errors. Adversarial imitation learning alleviates this issue but requires additional on-policy training samples for stability, which presents a challenge in realistic domains due to inefficient learning and high sample complexity. One approach to this issue is to learn a world model of the environment, and use synthetic data for policy training. While successful in prior works, we argue that this is sub-optimal due to additional distribution shifts between the learned model and the real environment. Instead, we re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one. Drawing theoretical connections to offline RL and fine-tuning algorithms, we argue that standard online world model algorithms are not well suited to the imitation learning problem. We derive a principled conservative optimization bound and demonstrate empirically that it leads to improved performance on two very challenging manipulation environments form high-dimensional raw pixel observations. We set a new state-of-the-art performance on the Franka Kitchen environment from images, requiring only 10 demos on no reward labels, as well as solving a complex dexterity manipulation task.'
volume: 242
URL: https://proceedings.mlr.press/v242/kolev24a.html
PDF: https://proceedings.mlr.press/v242/kolev24a/kolev24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-kolev24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Victor
family: Kolev
- given: Rafael
family: Rafailov
- given: Kyle
family: Hatch
- given: Jiajun
family: Wu
- given: Chelsea
family: Finn
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1777-1790
id: kolev24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1777
lastpage: 1790
published: 2024-06-11 00:00:00 +0000
- title: 'Restless bandits with rewards generated by a linear Gaussian dynamical system'
abstract: 'Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. In this work, we assume that the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action’s next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be used for predicting another action’s future reward, i.e. the reward sampled for action 1 at round $t-1$ can be used for predicting the reward for action $2$ at round $t$. This is accomplished by designing a modified Kalman filter with a matrix representation that can be learned for reward prediction. Numerical evaluations are carried out on a set of linear Gaussian dynamical systems and are compared with 2 other well-known stochastic multi-armed bandit algorithms.'
volume: 242
URL: https://proceedings.mlr.press/v242/gornet24a.html
PDF: https://proceedings.mlr.press/v242/gornet24a/gornet24a.pdf
edit: https://github.com/mlresearch//v242/edit/gh-pages/_posts/2024-06-11-gornet24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 6th Annual Learning for Dynamics & Control Conference'
publisher: 'PMLR'
author:
- given: Jonathan
family: Gornet
- given: Bruno
family: Sinopoli
editor:
- given: Alessandro
family: Abate
- given: Mark
family: Cannon
- given: Kostas
family: Margellos
- given: Antonis
family: Papachristodoulou
page: 1791-1802
id: gornet24a
issued:
date-parts:
- 2024
- 6
- 11
firstpage: 1791
lastpage: 1802
published: 2024-06-11 00:00:00 +0000