[edit]
Strategic Planning: A Top-Down Approach to Option Generation
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52258-52302, 2025.
Abstract
Real-world human decision-making often relies on strategic planning, where high-level goals guide the formulation of sub-goals and subsequent actions, as evidenced by domains such as healthcare, business, and urban policy. Despite notable successes in controlled settings, conventional reinforcement learning (RL) follows a bottom-up paradigm, which can struggle to adapt to real-world complexities such as sparse rewards and limited exploration budgets. While methods like hierarchical RL and environment shaping provide partial solutions, they frequently rely on either ad-hoc designs (e.g. choose the set of high-level actions) or purely data-driven discovery of high-level actions that still requires significant exploration. In this paper, we introduce a top-down framework for RL that explicitly leverages human-like strategy to reduce sample complexity, guide exploration, and enable high-level decision-making. We first formalize the Strategy Problem, which frames policy generation as finding distributions over policies that balance specificity and value. Building on this definition, we propose the Strategist agent—an iterative framework that leverages large language models (LLMs) to synthesize domain knowledge into a structured representation of actionable strategies and sub-goals. We further develop a reward shaping methodology that translates these strategies expressed in natural language into quantitative feedback for RL methods. Empirically, we demonstrate a significantly faster convergence than conventional PPO. Taken together, our findings highlight that top-down strategic exploration opens new avenues for enhancing RL on real-world decision problems.