Optimal control of the future via prospective learning with control

Yuxin Bai, Aranyak Acharyya, Ashwin De Silva, Zeyu Shen, James Hassett, Joshua T Vogelstein
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1157-1180, 2026.

Abstract

Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to \textit{control} in non-stationary, reset-free environments. Using this framework, called ”Prospective Learning with Control” (PLuC), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control: foraging, a canonical task relevant to both natural and artificial agents. We illustrate that modern RL algorithms, which assume stationarity, struggle in these non-stationary reset-free environments. Even with time-aware modifications, they converge orders of magnitude slower than our prospective foraging agents on a simple 1-D foraging benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v331-bai26a, title = {Optimal control of the future via prospective learning with control}, author = {Bai, Yuxin and Acharyya, Aranyak and Silva, Ashwin De and Shen, Zeyu and Hassett, James and Vogelstein, Joshua T}, booktitle = {Proceedings of The 8th Annual Learning for Dynamics and Control Conference}, pages = {1157--1180}, year = {2026}, editor = {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay}, volume = {331}, series = {Proceedings of Machine Learning Research}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v331/main/assets/bai26a/bai26a.pdf}, url = {https://proceedings.mlr.press/v331/bai26a.html}, abstract = {Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to \textit{control} in non-stationary, reset-free environments. Using this framework, called ”Prospective Learning with Control” (PLuC), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control: foraging, a canonical task relevant to both natural and artificial agents. We illustrate that modern RL algorithms, which assume stationarity, struggle in these non-stationary reset-free environments. Even with time-aware modifications, they converge orders of magnitude slower than our prospective foraging agents on a simple 1-D foraging benchmark.} }
Endnote
%0 Conference Paper %T Optimal control of the future via prospective learning with control %A Yuxin Bai %A Aranyak Acharyya %A Ashwin De Silva %A Zeyu Shen %A James Hassett %A Joshua T Vogelstein %B Proceedings of The 8th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2026 %E Gaurav Sukhatme %E Lars Lindemann %E Stephen Tu %E Adam Wierman %E Nikolay Atanasov %F pmlr-v331-bai26a %I PMLR %P 1157--1180 %U https://proceedings.mlr.press/v331/bai26a.html %V 331 %X Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to \textit{control} in non-stationary, reset-free environments. Using this framework, called ”Prospective Learning with Control” (PLuC), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control: foraging, a canonical task relevant to both natural and artificial agents. We illustrate that modern RL algorithms, which assume stationarity, struggle in these non-stationary reset-free environments. Even with time-aware modifications, they converge orders of magnitude slower than our prospective foraging agents on a simple 1-D foraging benchmark.
APA
Bai, Y., Acharyya, A., Silva, A.D., Shen, Z., Hassett, J. & Vogelstein, J.T.. (2026). Optimal control of the future via prospective learning with control. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1157-1180 Available from https://proceedings.mlr.press/v331/bai26a.html.

Related Material