Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation

Runze Zhao, Yue Yu, Adams Yiyue Zhu, Chen Yang, Dongruo Zhou
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:5030-5057, 2025.

Abstract

Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of $\tilde{O}(\sqrt{d_{\mathcal{R}} + d_{\mathcal{F}}}N^{-1/2})$ using $N$ measurements, where $d_{\mathcal{R}}$ and $d_{\mathcal{F}}$ denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learning. Moreover, we introduce structured policy updates and an alternative measurement strategy that significantly reduce the number of policy updates and rollouts while maintaining competitive sample efficiency. Our proposed algorithms are validated through experiments on continuous control tasks and diffusion model fine-tuning, demonstrating comparable performance with significantly fewer policy updates and rollouts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-zhao25a, title = {Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation}, author = {Zhao, Runze and Yu, Yue and Zhu, Adams Yiyue and Yang, Chen and Zhou, Dongruo}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {5030--5057}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/zhao25a/zhao25a.pdf}, url = {https://proceedings.mlr.press/v286/zhao25a.html}, abstract = {Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of $\tilde{O}(\sqrt{d_{\mathcal{R}} + d_{\mathcal{F}}}N^{-1/2})$ using $N$ measurements, where $d_{\mathcal{R}}$ and $d_{\mathcal{F}}$ denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learning. Moreover, we introduce structured policy updates and an alternative measurement strategy that significantly reduce the number of policy updates and rollouts while maintaining competitive sample efficiency. Our proposed algorithms are validated through experiments on continuous control tasks and diffusion model fine-tuning, demonstrating comparable performance with significantly fewer policy updates and rollouts.} }
Endnote
%0 Conference Paper %T Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation %A Runze Zhao %A Yue Yu %A Adams Yiyue Zhu %A Chen Yang %A Dongruo Zhou %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-zhao25a %I PMLR %P 5030--5057 %U https://proceedings.mlr.press/v286/zhao25a.html %V 286 %X Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of $\tilde{O}(\sqrt{d_{\mathcal{R}} + d_{\mathcal{F}}}N^{-1/2})$ using $N$ measurements, where $d_{\mathcal{R}}$ and $d_{\mathcal{F}}$ denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learning. Moreover, we introduce structured policy updates and an alternative measurement strategy that significantly reduce the number of policy updates and rollouts while maintaining competitive sample efficiency. Our proposed algorithms are validated through experiments on continuous control tasks and diffusion model fine-tuning, demonstrating comparable performance with significantly fewer policy updates and rollouts.
APA
Zhao, R., Yu, Y., Zhu, A.Y., Yang, C. & Zhou, D.. (2025). Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:5030-5057 Available from https://proceedings.mlr.press/v286/zhao25a.html.

Related Material