Distributed Offline Policy Optimization Over Batch Data

Han Shen, Songtao Lu, Xiaodong Cui, Tianyi Chen
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:4443-4472, 2023.

Abstract

Federated learning (FL) has received increasing interests during the past years, However, most of the existing works focus on supervised learning, and federated learning for sequential decision making has not been fully explored. Part of the reason is that learning a policy for sequential decision making typically requires repeated interaction with the environments, which is costly in many FL applications.To overcome this issue, this work proposes a federated offline policy optimization method abbreviated as FedOPO that allows clients to jointly learn the optimal policy without interacting with environments during training. Albeit the nonconcave-convex-strongly concave nature of the resultant max-min-max problem, we establish both the local and global convergence of our FedOPO algorithm. Experiments on the OpenAI gym demonstrate that our algorithm is able to find a near-optimal policy while enjoying various merits brought by FL, including training speedup and improved asymptotic performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-shen23b, title = {Distributed Offline Policy Optimization Over Batch Data}, author = {Shen, Han and Lu, Songtao and Cui, Xiaodong and Chen, Tianyi}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {4443--4472}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/shen23b/shen23b.pdf}, url = {https://proceedings.mlr.press/v206/shen23b.html}, abstract = {Federated learning (FL) has received increasing interests during the past years, However, most of the existing works focus on supervised learning, and federated learning for sequential decision making has not been fully explored. Part of the reason is that learning a policy for sequential decision making typically requires repeated interaction with the environments, which is costly in many FL applications.To overcome this issue, this work proposes a federated offline policy optimization method abbreviated as FedOPO that allows clients to jointly learn the optimal policy without interacting with environments during training. Albeit the nonconcave-convex-strongly concave nature of the resultant max-min-max problem, we establish both the local and global convergence of our FedOPO algorithm. Experiments on the OpenAI gym demonstrate that our algorithm is able to find a near-optimal policy while enjoying various merits brought by FL, including training speedup and improved asymptotic performance.} }
Endnote
%0 Conference Paper %T Distributed Offline Policy Optimization Over Batch Data %A Han Shen %A Songtao Lu %A Xiaodong Cui %A Tianyi Chen %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-shen23b %I PMLR %P 4443--4472 %U https://proceedings.mlr.press/v206/shen23b.html %V 206 %X Federated learning (FL) has received increasing interests during the past years, However, most of the existing works focus on supervised learning, and federated learning for sequential decision making has not been fully explored. Part of the reason is that learning a policy for sequential decision making typically requires repeated interaction with the environments, which is costly in many FL applications.To overcome this issue, this work proposes a federated offline policy optimization method abbreviated as FedOPO that allows clients to jointly learn the optimal policy without interacting with environments during training. Albeit the nonconcave-convex-strongly concave nature of the resultant max-min-max problem, we establish both the local and global convergence of our FedOPO algorithm. Experiments on the OpenAI gym demonstrate that our algorithm is able to find a near-optimal policy while enjoying various merits brought by FL, including training speedup and improved asymptotic performance.
APA
Shen, H., Lu, S., Cui, X. & Chen, T.. (2023). Distributed Offline Policy Optimization Over Batch Data. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:4443-4472 Available from https://proceedings.mlr.press/v206/shen23b.html.

Related Material