Adaptive Action Advising with Different Rewards

Yue Guo, Xijia Zhang, Simon Stepputtis, Joseph Campbell, Katia P. Sycara
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:252-267, 2025.

Abstract

Action advising is a critical aspect of reinforcement learning, involving a teacher-student paradigm wherein the teacher, possessing a pre-trained policy, advises the student with the actions calculated from its policy based on the latter’s observations, thereby improving the student’s task performance. An important requirement is for the teacher to be able to learn to robustly adapt and give effective advice in new environments where the reward is different from the one the teacher has been trained on. This issue has not been considered in the current teacher-student literature, therefore, most of the work require the teacher to be pre-trained with the same reward that the student interacts with and cannot generalize advice that differs from the policy; the reward that the student gained through interaction with the environment is also directly given to the teacher, regardless the exploration process. To fill this gap, our proposed method enhances action advising by allowing the teacher to learn by observing and collecting data from the student and adapting its reward function. We empirically evaluate our method over three environments consisting of a Gridworld, an ALE skiing, and a Pacman, and find that our method demonstrates improved policy returns and sample efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v274-guo25a, title = {Adaptive Action Advising with Different Rewards}, author = {Guo, Yue and Zhang, Xijia and Stepputtis, Simon and Campbell, Joseph and Sycara, Katia P.}, booktitle = {Proceedings of The 3rd Conference on Lifelong Learning Agents}, pages = {252--267}, year = {2025}, editor = {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan}, volume = {274}, series = {Proceedings of Machine Learning Research}, month = {29 Jul--01 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v274/main/assets/guo25a/guo25a.pdf}, url = {https://proceedings.mlr.press/v274/guo25a.html}, abstract = {Action advising is a critical aspect of reinforcement learning, involving a teacher-student paradigm wherein the teacher, possessing a pre-trained policy, advises the student with the actions calculated from its policy based on the latter’s observations, thereby improving the student’s task performance. An important requirement is for the teacher to be able to learn to robustly adapt and give effective advice in new environments where the reward is different from the one the teacher has been trained on. This issue has not been considered in the current teacher-student literature, therefore, most of the work require the teacher to be pre-trained with the same reward that the student interacts with and cannot generalize advice that differs from the policy; the reward that the student gained through interaction with the environment is also directly given to the teacher, regardless the exploration process. To fill this gap, our proposed method enhances action advising by allowing the teacher to learn by observing and collecting data from the student and adapting its reward function. We empirically evaluate our method over three environments consisting of a Gridworld, an ALE skiing, and a Pacman, and find that our method demonstrates improved policy returns and sample efficiency.} }
Endnote
%0 Conference Paper %T Adaptive Action Advising with Different Rewards %A Yue Guo %A Xijia Zhang %A Simon Stepputtis %A Joseph Campbell %A Katia P. Sycara %B Proceedings of The 3rd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2025 %E Vincenzo Lomonaco %E Stefano Melacci %E Tinne Tuytelaars %E Sarath Chandar %E Razvan Pascanu %F pmlr-v274-guo25a %I PMLR %P 252--267 %U https://proceedings.mlr.press/v274/guo25a.html %V 274 %X Action advising is a critical aspect of reinforcement learning, involving a teacher-student paradigm wherein the teacher, possessing a pre-trained policy, advises the student with the actions calculated from its policy based on the latter’s observations, thereby improving the student’s task performance. An important requirement is for the teacher to be able to learn to robustly adapt and give effective advice in new environments where the reward is different from the one the teacher has been trained on. This issue has not been considered in the current teacher-student literature, therefore, most of the work require the teacher to be pre-trained with the same reward that the student interacts with and cannot generalize advice that differs from the policy; the reward that the student gained through interaction with the environment is also directly given to the teacher, regardless the exploration process. To fill this gap, our proposed method enhances action advising by allowing the teacher to learn by observing and collecting data from the student and adapting its reward function. We empirically evaluate our method over three environments consisting of a Gridworld, an ALE skiing, and a Pacman, and find that our method demonstrates improved policy returns and sample efficiency.
APA
Guo, Y., Zhang, X., Stepputtis, S., Campbell, J. & Sycara, K.P.. (2025). Adaptive Action Advising with Different Rewards. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:252-267 Available from https://proceedings.mlr.press/v274/guo25a.html.

Related Material