[edit]
VOLTS: Validated Output through Logit Tree Search for Reliable PDDL Planning with Small Language Models
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:872-879, 2026.
Abstract
Autonomous agents that must run on edge hardware cannot afford the compute footprint of frontier LLMs, yet they still need dependable task-planning. We address this gap by showing how a single pass with Llama 3.1 8B, 4-bit Small Language Model (SLM) can generate syntactically correct plans in the symbolic-planning formalism Planning Domain Definition Language (PDDL) while respecting tight memory and latency budgets. VOLTS rests on three ideas. (1) Action-token fine tuning: the SLM is fine-tuned on a custom vocabulary where every token encodes a complete grounded action, giving the model strong task heuristics without expanding its size. (2) Real-time validator: a lightweight symbolic module checks each candidate token against the current state during decoding, guaranteeing that any plan emitted contains no hallucinated or infeasible actions. (3) Parallel branching search: when several validated actions appear promising, VOLTS explores them in parallel branches within the same forward pass, preserving single-pass efficiency while widening search. Evaluated on 2000 problems (500 each in the IPC Blocksworld, Logistics, DriverLog, and Rover domains), VOLTS returns valid plans for 76% of tasks. Those plans average 1.08$\times$ the length of solutions from the classical Fast Downward planner, far outperforming GPT-4o (7% validity) and a finetuned baseline without in-loop validation (0.13%). Unlike Tree-Planner or LLM Modulo frameworks, VOLTS validates per token inside a single inference pass, eliminating costly iterative cycles. By coupling resource-aware neural guidance with deterministic symbolic checks, VOLTS opens the door to reliable, on-device planning for robots, drones, and embedded IoT agents where every millisecond and megabyte counts.