Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Xu Wang; Yan Hu; Wenyu Du; Reynold Cheng; Benyou Wang; Difan Zou

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Xu Wang, Yan Hu, Wenyu Du, Reynold Cheng, Benyou Wang, Difan Zou

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:63088-63112, 2025.

Abstract

Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024, Chhabra et al. 2024) that focus on tasks where pre-trained models already perform well, we develop a set of mathematical tasks where fine-tuning yields substantial performance gains, bringing the setup closer to real-world scenarios. In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, contrasting with previous work (Prakash et al. 2024, Chhabra et al. 2024) that reported only small circuit additions after fine-tuning. Based on these observations, we develop a circuit-aware Low-Rank Adaptation (LoRA) method that assigns ranks to layers according to edge changes in the circuits. Experimental results demonstrate that our circuit-based LoRA achieves an average improvement of 2.46% over standard LoRA with comparable parameter sizes. Furthermore, we explore how combining circuits from subtasks can enhance fine-tuning in compositional tasks, offering new insights into task design and deepening our understanding of circuit dynamics and fine-tuning mechanisms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wang25ak,
  title = 	 {Towards Understanding Fine-Tuning Mechanisms of {LLM}s via Circuit Analysis},
  author =       {Wang, Xu and Hu, Yan and Du, Wenyu and Cheng, Reynold and Wang, Benyou and Zou, Difan},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {63088--63112},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25ak/wang25ak.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wang25ak.html},
  abstract = 	 {Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024, Chhabra et al. 2024) that focus on tasks where pre-trained models already perform well, we develop a set of mathematical tasks where fine-tuning yields substantial performance gains, bringing the setup closer to real-world scenarios. In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, contrasting with previous work (Prakash et al. 2024, Chhabra et al. 2024) that reported only small circuit additions after fine-tuning. Based on these observations, we develop a circuit-aware Low-Rank Adaptation (LoRA) method that assigns ranks to layers according to edge changes in the circuits. Experimental results demonstrate that our circuit-based LoRA achieves an average improvement of 2.46% over standard LoRA with comparable parameter sizes. Furthermore, we explore how combining circuits from subtasks can enhance fine-tuning in compositional tasks, offering new insights into task design and deepening our understanding of circuit dynamics and fine-tuning mechanisms.}
}

Endnote

%0 Conference Paper
%T Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
%A Xu Wang
%A Yan Hu
%A Wenyu Du
%A Reynold Cheng
%A Benyou Wang
%A Difan Zou
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wang25ak
%I PMLR
%P 63088--63112
%U https://proceedings.mlr.press/v267/wang25ak.html
%V 267
%X Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024, Chhabra et al. 2024) that focus on tasks where pre-trained models already perform well, we develop a set of mathematical tasks where fine-tuning yields substantial performance gains, bringing the setup closer to real-world scenarios. In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, contrasting with previous work (Prakash et al. 2024, Chhabra et al. 2024) that reported only small circuit additions after fine-tuning. Based on these observations, we develop a circuit-aware Low-Rank Adaptation (LoRA) method that assigns ranks to layers according to edge changes in the circuits. Experimental results demonstrate that our circuit-based LoRA achieves an average improvement of 2.46% over standard LoRA with comparable parameter sizes. Furthermore, we explore how combining circuits from subtasks can enhance fine-tuning in compositional tasks, offering new insights into task design and deepening our understanding of circuit dynamics and fine-tuning mechanisms.

APA

Wang, X., Hu, Y., Du, W., Cheng, R., Wang, B. & Zou, D.. (2025). Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:63088-63112 Available from https://proceedings.mlr.press/v267/wang25ak.html.

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Abstract

Cite this Paper

Related Material