NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric

Jingtan Wang; Xiaoqiang Lin; Rui Qiao; Pang Wei Koh; Chuan-Sheng Foo; Bryan Kian Hsiang Low

NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric

Jingtan Wang, Xiaoqiang Lin, Rui Qiao, Pang Wei Koh, Chuan-Sheng Foo, Bryan Kian Hsiang Low

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:63662-63689, 2025.

Abstract

Curating data for instruction tuning is crucial for enhancing the performance of large language models (LLMs). This work aims to select training data for instruction tuning to improve the LLM performance on specific tasks. Existing methods often rely on next-token prediction (NTP) loss as a proxy for target task performance due to the non-differentiable nature of performance evaluation metrics. They select training data points that are most helpful in reducing validation loss. However, there is a discrepancy between minimizing NTP loss and maximizing performance (e.g., code pass rate in code generation). To remedy this, we introduce a novel Non-differentiable evaluation metric-based InfluenCe Estimation (NICE), which leverages the policy gradient to select the training data that improves the performance. Moreover, NICE can perform data selection in the absence of labels (ground-truth responses) when the evaluation metrics do not require labels (e.g., a reward model can output reward scores without supervision from labels). Experimental results show that our approach outperforms existing data selection baselines that use NTP loss in diverse and realistic scenarios. Notably, subsets selected by NICE often produce models that outperform those trained on the full dataset. Our code is available at https://github.com/JTWang2000/NICE.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wang25bm,
  title = 	 {{NICE} Data Selection for Instruction Tuning in {LLM}s with Non-differentiable Evaluation Metric},
  author =       {Wang, Jingtan and Lin, Xiaoqiang and Qiao, Rui and Koh, Pang Wei and Foo, Chuan-Sheng and Low, Bryan Kian Hsiang},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {63662--63689},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25bm/wang25bm.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wang25bm.html},
  abstract = 	 {Curating data for instruction tuning is crucial for enhancing the performance of large language models (LLMs). This work aims to select training data for instruction tuning to improve the LLM performance on specific tasks. Existing methods often rely on next-token prediction (NTP) loss as a proxy for target task performance due to the non-differentiable nature of performance evaluation metrics. They select training data points that are most helpful in reducing validation loss. However, there is a discrepancy between minimizing NTP loss and maximizing performance (e.g., code pass rate in code generation). To remedy this, we introduce a novel Non-differentiable evaluation metric-based InfluenCe Estimation (NICE), which leverages the policy gradient to select the training data that improves the performance. Moreover, NICE can perform data selection in the absence of labels (ground-truth responses) when the evaluation metrics do not require labels (e.g., a reward model can output reward scores without supervision from labels). Experimental results show that our approach outperforms existing data selection baselines that use NTP loss in diverse and realistic scenarios. Notably, subsets selected by NICE often produce models that outperform those trained on the full dataset. Our code is available at https://github.com/JTWang2000/NICE.}
}

Endnote

%0 Conference Paper
%T NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric
%A Jingtan Wang
%A Xiaoqiang Lin
%A Rui Qiao
%A Pang Wei Koh
%A Chuan-Sheng Foo
%A Bryan Kian Hsiang Low
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wang25bm
%I PMLR
%P 63662--63689
%U https://proceedings.mlr.press/v267/wang25bm.html
%V 267
%X Curating data for instruction tuning is crucial for enhancing the performance of large language models (LLMs). This work aims to select training data for instruction tuning to improve the LLM performance on specific tasks. Existing methods often rely on next-token prediction (NTP) loss as a proxy for target task performance due to the non-differentiable nature of performance evaluation metrics. They select training data points that are most helpful in reducing validation loss. However, there is a discrepancy between minimizing NTP loss and maximizing performance (e.g., code pass rate in code generation). To remedy this, we introduce a novel Non-differentiable evaluation metric-based InfluenCe Estimation (NICE), which leverages the policy gradient to select the training data that improves the performance. Moreover, NICE can perform data selection in the absence of labels (ground-truth responses) when the evaluation metrics do not require labels (e.g., a reward model can output reward scores without supervision from labels). Experimental results show that our approach outperforms existing data selection baselines that use NTP loss in diverse and realistic scenarios. Notably, subsets selected by NICE often produce models that outperform those trained on the full dataset. Our code is available at https://github.com/JTWang2000/NICE.

APA

Wang, J., Lin, X., Qiao, R., Koh, P.W., Foo, C. & Low, B.K.H.. (2025). NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:63662-63689 Available from https://proceedings.mlr.press/v267/wang25bm.html.

NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric

Abstract

Cite this Paper

Related Material