CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Jun Wang; Yevgeniy Vorobeychik; Yiannis Kantaros

CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Jun Wang, Yevgeniy Vorobeychik, Yiannis Kantaros

Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1558-1574, 2026.

Abstract

Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware finetuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.

Cite this Paper

BibTeX

@InProceedings{pmlr-v331-wang26c,
  title = 	 {CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning},
  author =       {Wang, Jun and Vorobeychik, Yevgeniy and Kantaros, Yiannis},
  booktitle = 	 {Proceedings of The 8th Annual Learning for Dynamics and Control Conference},
  pages = 	 {1558--1574},
  year = 	 {2026},
  editor = 	 {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay},
  volume = 	 {331},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v331/main/assets/wang26c/wang26c.pdf},
  url = 	 {https://proceedings.mlr.press/v331/wang26c.html},
  abstract = 	 {Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware finetuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.}
}

Endnote

%0 Conference Paper
%T CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning
%A Jun Wang
%A Yevgeniy Vorobeychik
%A Yiannis Kantaros
%B Proceedings of The 8th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2026
%E Gaurav Sukhatme
%E Lars Lindemann
%E Stephen Tu
%E Adam Wierman
%E Nikolay Atanasov	
%F pmlr-v331-wang26c
%I PMLR
%P 1558--1574
%U https://proceedings.mlr.press/v331/wang26c.html
%V 331
%X Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware finetuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.

APA

Wang, J., Vorobeychik, Y. & Kantaros, Y.. (2026). CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1558-1574 Available from https://proceedings.mlr.press/v331/wang26c.html.

CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Abstract

Cite this Paper

Related Material