Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting

Da Wang; Lin Li; Wei Wei; Qixian Yu; Jianye Hao; Jiye Liang

Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting

Da Wang, Lin Li, Wei Wei, Qixian Yu, Jianye Hao, Jiye Liang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:50822-50839, 2024.

Abstract

Offline Reinforcement Learning (RL) commonly suffers from the out-of-distribution (OOD) overestimation issue due to the distribution shift. Prior work gradually shifts their focus from suppressing OOD overestimation to avoiding overly conservative learning from suboptimal behavior policies to improve generalization. However, most approaches explicitly delimit boundaries for OOD actions based on the support in the dataset, which can potentially impede the data near these boundaries from acquiring realistic estimates. This paper investigates how to loosen the rigid demarcation of OOD boundaries, adaptively extracting knowledge from empirical data to implicitly improve the model’s generalization to nearby unseen data. We introduce an adversarial data splitting (ADS) framework that enforces the model to generalize the distribution shifts simulated from the train/validation subsets splitting of the dataset. Specifically, ADS is modeled as a min-max optimization problem inspired by meta-learning and solved by iterating over the following two steps. First, we train the model on the train-subset to minimize its loss on the validation-subset. Then, we adversarially generate the "hardest" train/validation subsets with the maximum distribution shift, making the model incapable of generalization at that splitting. We derive a generalization error bound for theoretically understanding ADS and verify the effectiveness with extensive experiments. Code is available at https://github.com/DkING-lv6/ADS.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-wang24aj,
  title = 	 {Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting},
  author =       {Wang, Da and Li, Lin and Wei, Wei and Yu, Qixian and Hao, Jianye and Liang, Jiye},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {50822--50839},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24aj/wang24aj.pdf},
  url = 	 {https://proceedings.mlr.press/v235/wang24aj.html},
  abstract = 	 {Offline Reinforcement Learning (RL) commonly suffers from the out-of-distribution (OOD) overestimation issue due to the distribution shift. Prior work gradually shifts their focus from suppressing OOD overestimation to avoiding overly conservative learning from suboptimal behavior policies to improve generalization. However, most approaches explicitly delimit boundaries for OOD actions based on the support in the dataset, which can potentially impede the data near these boundaries from acquiring realistic estimates. This paper investigates how to loosen the rigid demarcation of OOD boundaries, adaptively extracting knowledge from empirical data to implicitly improve the model’s generalization to nearby unseen data. We introduce an adversarial data splitting (ADS) framework that enforces the model to generalize the distribution shifts simulated from the train/validation subsets splitting of the dataset. Specifically, ADS is modeled as a min-max optimization problem inspired by meta-learning and solved by iterating over the following two steps. First, we train the model on the train-subset to minimize its loss on the validation-subset. Then, we adversarially generate the "hardest" train/validation subsets with the maximum distribution shift, making the model incapable of generalization at that splitting. We derive a generalization error bound for theoretically understanding ADS and verify the effectiveness with extensive experiments. Code is available at https://github.com/DkING-lv6/ADS.}
}

Endnote

%0 Conference Paper
%T Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting
%A Da Wang
%A Lin Li
%A Wei Wei
%A Qixian Yu
%A Jianye Hao
%A Jiye Liang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-wang24aj
%I PMLR
%P 50822--50839
%U https://proceedings.mlr.press/v235/wang24aj.html
%V 235
%X Offline Reinforcement Learning (RL) commonly suffers from the out-of-distribution (OOD) overestimation issue due to the distribution shift. Prior work gradually shifts their focus from suppressing OOD overestimation to avoiding overly conservative learning from suboptimal behavior policies to improve generalization. However, most approaches explicitly delimit boundaries for OOD actions based on the support in the dataset, which can potentially impede the data near these boundaries from acquiring realistic estimates. This paper investigates how to loosen the rigid demarcation of OOD boundaries, adaptively extracting knowledge from empirical data to implicitly improve the model’s generalization to nearby unseen data. We introduce an adversarial data splitting (ADS) framework that enforces the model to generalize the distribution shifts simulated from the train/validation subsets splitting of the dataset. Specifically, ADS is modeled as a min-max optimization problem inspired by meta-learning and solved by iterating over the following two steps. First, we train the model on the train-subset to minimize its loss on the validation-subset. Then, we adversarially generate the "hardest" train/validation subsets with the maximum distribution shift, making the model incapable of generalization at that splitting. We derive a generalization error bound for theoretically understanding ADS and verify the effectiveness with extensive experiments. Code is available at https://github.com/DkING-lv6/ADS.

APA


Wang, D., Li, L., Wei, W., Yu, Q., Hao, J. & Liang, J.. (2024). Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:50822-50839 Available from https://proceedings.mlr.press/v235/wang24aj.html.

Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting

Abstract

Cite this Paper

Related Material