Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39126-39146, 2025.

Abstract

This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a cooperative game could unintentionally introduce a sampling bias during rationale extraction. Specifically, the generator might inadvertently create an incorrect correlation between the selected rationale candidate and the label, even when they are semantically unrelated in the original dataset. Subsequently, we elucidate the origins of this bias using both detailed theoretical analysis and empirical evidence. Our findings suggest a direction for inspecting these correlations through attacks, based on which we further introduce an instruction to prevent the predictor from learning the correlations. Through experiments on six text classification datasets and two graph classification datasets using three network architectures (GRUs, BERT, and GCN), we show that our method significantly outperforms recent rationalization methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25av, title = {Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets}, author = {Liu, Wei and Niu, Zhongyu and Gao, Lang and Deng, Zhiying and Wang, Jun and Wang, Haozhao and Li, Ruixuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {39126--39146}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25av/liu25av.pdf}, url = {https://proceedings.mlr.press/v267/liu25av.html}, abstract = {This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a cooperative game could unintentionally introduce a sampling bias during rationale extraction. Specifically, the generator might inadvertently create an incorrect correlation between the selected rationale candidate and the label, even when they are semantically unrelated in the original dataset. Subsequently, we elucidate the origins of this bias using both detailed theoretical analysis and empirical evidence. Our findings suggest a direction for inspecting these correlations through attacks, based on which we further introduce an instruction to prevent the predictor from learning the correlations. Through experiments on six text classification datasets and two graph classification datasets using three network architectures (GRUs, BERT, and GCN), we show that our method significantly outperforms recent rationalization methods.} }
Endnote
%0 Conference Paper %T Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets %A Wei Liu %A Zhongyu Niu %A Lang Gao %A Zhiying Deng %A Jun Wang %A Haozhao Wang %A Ruixuan Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25av %I PMLR %P 39126--39146 %U https://proceedings.mlr.press/v267/liu25av.html %V 267 %X This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a cooperative game could unintentionally introduce a sampling bias during rationale extraction. Specifically, the generator might inadvertently create an incorrect correlation between the selected rationale candidate and the label, even when they are semantically unrelated in the original dataset. Subsequently, we elucidate the origins of this bias using both detailed theoretical analysis and empirical evidence. Our findings suggest a direction for inspecting these correlations through attacks, based on which we further introduce an instruction to prevent the predictor from learning the correlations. Through experiments on six text classification datasets and two graph classification datasets using three network architectures (GRUs, BERT, and GCN), we show that our method significantly outperforms recent rationalization methods.
APA
Liu, W., Niu, Z., Gao, L., Deng, Z., Wang, J., Wang, H. & Li, R.. (2025). Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:39126-39146 Available from https://proceedings.mlr.press/v267/liu25av.html.

Related Material