VOLTS: Validated Output through Logit Tree Search for Reliable PDDL Planning with Small Language Models

Nicholas Massad, Amine Trabelsi, Francois Ferland, Froduald Kabanza
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:872-879, 2026.

Abstract

Autonomous agents that must run on edge hardware cannot afford the compute footprint of frontier LLMs, yet they still need dependable task-planning. We address this gap by showing how a single pass with Llama 3.1 8B, 4-bit Small Language Model (SLM) can generate syntactically correct plans in the symbolic-planning formalism Planning Domain Definition Language (PDDL) while respecting tight memory and latency budgets. VOLTS rests on three ideas. (1) Action-token fine tuning: the SLM is fine-tuned on a custom vocabulary where every token encodes a complete grounded action, giving the model strong task heuristics without expanding its size. (2) Real-time validator: a lightweight symbolic module checks each candidate token against the current state during decoding, guaranteeing that any plan emitted contains no hallucinated or infeasible actions. (3) Parallel branching search: when several validated actions appear promising, VOLTS explores them in parallel branches within the same forward pass, preserving single-pass efficiency while widening search. Evaluated on 2000 problems (500 each in the IPC Blocksworld, Logistics, DriverLog, and Rover domains), VOLTS returns valid plans for 76% of tasks. Those plans average 1.08$\times$ the length of solutions from the classical Fast Downward planner, far outperforming GPT-4o (7% validity) and a finetuned baseline without in-loop validation (0.13%). Unlike Tree-Planner or LLM Modulo frameworks, VOLTS validates per token inside a single inference pass, eliminating costly iterative cycles. By coupling resource-aware neural guidance with deterministic symbolic checks, VOLTS opens the door to reliable, on-device planning for robots, drones, and embedded IoT agents where every millisecond and megabyte counts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-massad26a, title = {VOLTS: Validated Output through Logit Tree Search for Reliable PDDL Planning with Small Language Models}, author = {Massad, Nicholas and Trabelsi, Amine and Ferland, Francois and Kabanza, Froduald}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {872--879}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/massad26a/massad26a.pdf}, url = {https://proceedings.mlr.press/v318/massad26a.html}, abstract = {Autonomous agents that must run on edge hardware cannot afford the compute footprint of frontier LLMs, yet they still need dependable task-planning. We address this gap by showing how a single pass with Llama 3.1 8B, 4-bit Small Language Model (SLM) can generate syntactically correct plans in the symbolic-planning formalism Planning Domain Definition Language (PDDL) while respecting tight memory and latency budgets. VOLTS rests on three ideas. (1) Action-token fine tuning: the SLM is fine-tuned on a custom vocabulary where every token encodes a complete grounded action, giving the model strong task heuristics without expanding its size. (2) Real-time validator: a lightweight symbolic module checks each candidate token against the current state during decoding, guaranteeing that any plan emitted contains no hallucinated or infeasible actions. (3) Parallel branching search: when several validated actions appear promising, VOLTS explores them in parallel branches within the same forward pass, preserving single-pass efficiency while widening search. Evaluated on 2000 problems (500 each in the IPC Blocksworld, Logistics, DriverLog, and Rover domains), VOLTS returns valid plans for 76% of tasks. Those plans average 1.08$\times$ the length of solutions from the classical Fast Downward planner, far outperforming GPT-4o (7% validity) and a finetuned baseline without in-loop validation (0.13%). Unlike Tree-Planner or LLM Modulo frameworks, VOLTS validates per token inside a single inference pass, eliminating costly iterative cycles. By coupling resource-aware neural guidance with deterministic symbolic checks, VOLTS opens the door to reliable, on-device planning for robots, drones, and embedded IoT agents where every millisecond and megabyte counts.} }
Endnote
%0 Conference Paper %T VOLTS: Validated Output through Logit Tree Search for Reliable PDDL Planning with Small Language Models %A Nicholas Massad %A Amine Trabelsi %A Francois Ferland %A Froduald Kabanza %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-massad26a %I PMLR %P 872--879 %U https://proceedings.mlr.press/v318/massad26a.html %V 318 %X Autonomous agents that must run on edge hardware cannot afford the compute footprint of frontier LLMs, yet they still need dependable task-planning. We address this gap by showing how a single pass with Llama 3.1 8B, 4-bit Small Language Model (SLM) can generate syntactically correct plans in the symbolic-planning formalism Planning Domain Definition Language (PDDL) while respecting tight memory and latency budgets. VOLTS rests on three ideas. (1) Action-token fine tuning: the SLM is fine-tuned on a custom vocabulary where every token encodes a complete grounded action, giving the model strong task heuristics without expanding its size. (2) Real-time validator: a lightweight symbolic module checks each candidate token against the current state during decoding, guaranteeing that any plan emitted contains no hallucinated or infeasible actions. (3) Parallel branching search: when several validated actions appear promising, VOLTS explores them in parallel branches within the same forward pass, preserving single-pass efficiency while widening search. Evaluated on 2000 problems (500 each in the IPC Blocksworld, Logistics, DriverLog, and Rover domains), VOLTS returns valid plans for 76% of tasks. Those plans average 1.08$\times$ the length of solutions from the classical Fast Downward planner, far outperforming GPT-4o (7% validity) and a finetuned baseline without in-loop validation (0.13%). Unlike Tree-Planner or LLM Modulo frameworks, VOLTS validates per token inside a single inference pass, eliminating costly iterative cycles. By coupling resource-aware neural guidance with deterministic symbolic checks, VOLTS opens the door to reliable, on-device planning for robots, drones, and embedded IoT agents where every millisecond and megabyte counts.
APA
Massad, N., Trabelsi, A., Ferland, F. & Kabanza, F.. (2026). VOLTS: Validated Output through Logit Tree Search for Reliable PDDL Planning with Small Language Models. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:872-879 Available from https://proceedings.mlr.press/v318/massad26a.html.

Related Material