Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh; Elias Stengel-Eskin; Sarath Chandar; Marc-Alexandre Côté

Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Côté

Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:1053-1075, 2025.

Abstract

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

Cite this Paper

BibTeX

@InProceedings{pmlr-v274-hashemzadeh25a,
  title = 	 {Sub-goal Distillation: A Method to Improve Small Language Agents},
  author =       {Hashemzadeh, Maryam and Stengel-Eskin, Elias and Chandar, Sarath and C{\^{o}}t{\'{e}}, Marc-Alexandre},
  booktitle = 	 {Proceedings of The 3rd Conference on Lifelong Learning Agents},
  pages = 	 {1053--1075},
  year = 	 {2025},
  editor = 	 {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan},
  volume = 	 {274},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jul--01 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v274/main/assets/hashemzadeh25a/hashemzadeh25a.pdf},
  url = 	 {https://proceedings.mlr.press/v274/hashemzadeh25a.html},
  abstract = 	 {While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.}
}

Endnote

%0 Conference Paper
%T Sub-goal Distillation: A Method to Improve Small Language Agents
%A Maryam Hashemzadeh
%A Elias Stengel-Eskin
%A Sarath Chandar
%A Marc-Alexandre Côté
%B Proceedings of The 3rd Conference on Lifelong Learning Agents
%C Proceedings of Machine Learning Research
%D 2025
%E Vincenzo Lomonaco
%E Stefano Melacci
%E Tinne Tuytelaars
%E Sarath Chandar
%E Razvan Pascanu	
%F pmlr-v274-hashemzadeh25a
%I PMLR
%P 1053--1075
%U https://proceedings.mlr.press/v274/hashemzadeh25a.html
%V 274
%X While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

APA

Hashemzadeh, M., Stengel-Eskin, E., Chandar, S. & Côté, M.. (2025). Sub-goal Distillation: A Method to Improve Small Language Agents. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:1053-1075 Available from https://proceedings.mlr.press/v274/hashemzadeh25a.html.

Related Material

Download PDF