Necessary and Sufficient Oracles: Toward a Computational Taxonomy for Reinforcement Learning

Dhruv Rohatgi, Dylan J. Foster
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:4838-4936, 2025.

Abstract

Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning “oracles” it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a \emph{minimal} oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model—a ubiquitous setting for RL with function approximation—we identify \emph{two-context regression} as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify \emph{one-context regression} as a near-minimal oracle in the stronger \emph{reset} access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to \emph{Low-Rank MDPs}, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-rohatgi25b, title = {Necessary and Sufficient Oracles: Toward a Computational Taxonomy for Reinforcement Learning}, author = {Rohatgi, Dhruv and Foster, Dylan J.}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {4838--4936}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/rohatgi25b/rohatgi25b.pdf}, url = {https://proceedings.mlr.press/v291/rohatgi25b.html}, abstract = {Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning “oracles” it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a \emph{minimal} oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model—a ubiquitous setting for RL with function approximation—we identify \emph{two-context regression} as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify \emph{one-context regression} as a near-minimal oracle in the stronger \emph{reset} access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to \emph{Low-Rank MDPs}, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.} }
Endnote
%0 Conference Paper %T Necessary and Sufficient Oracles: Toward a Computational Taxonomy for Reinforcement Learning %A Dhruv Rohatgi %A Dylan J. Foster %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-rohatgi25b %I PMLR %P 4838--4936 %U https://proceedings.mlr.press/v291/rohatgi25b.html %V 291 %X Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning “oracles” it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a \emph{minimal} oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model—a ubiquitous setting for RL with function approximation—we identify \emph{two-context regression} as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify \emph{one-context regression} as a near-minimal oracle in the stronger \emph{reset} access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to \emph{Low-Rank MDPs}, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.
APA
Rohatgi, D. & Foster, D.J.. (2025). Necessary and Sufficient Oracles: Toward a Computational Taxonomy for Reinforcement Learning. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:4838-4936 Available from https://proceedings.mlr.press/v291/rohatgi25b.html.

Related Material