Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization

Zhou Zhai, Wanli Shi, Heng Huang, Yi Chang, Bin Gu
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1162-1170, 2024.

Abstract

Zeroth-order (ZO) methods, which use the finite difference of two function evaluations (also called ZO gradient) to approximate first-order gradient, have attracted much attention recently in machine learning because of their broad applications. The accuracy of the ZO gradient highly depends on how many finite differences are averaged, which are intrinsically determined by the number of perturbations randomly drawn from a distribution. Existing ZO methods try to learn a data-driven distribution for sampling the perturbations to improve the efficiency of ZO optimization (ZOO) algorithms. In this paper, we explore a new and parallel direction, \textit{i.e.}, learn an optimal sampling policy instead of using a totally random strategy to generate perturbations based on the techniques of reinforcement learning (RL), which makes it possible to approximate the gradient with only two function evaluations. Specifically, we first formulate the problem of learning a sampling policy as a Markov decision process. Then, we propose our ZO-RL algorithm, \textit{i.e.}, using deep deterministic policy gradient, an actor-critic RL algorithm to learn a sampling policy that can guide the generation of perturbed vectors in getting ZO gradients as accurately as possible. Importantly, the existing ZOO algorithms for learning a distribution can be plugged in to improve the exploration of ZO-RL. Experimental results with different ZO estimators show that our ZO-RL algorithm can effectively reduce the query complexity of ZOO algorithms and converge faster than existing ZOO algorithms, especially in the later stage of the optimization process.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-zhai24a, title = {Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization}, author = {Zhai, Zhou and Shi, Wanli and Huang, Heng and Chang, Yi and Gu, Bin}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {1162--1170}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/zhai24a/zhai24a.pdf}, url = {https://proceedings.mlr.press/v238/zhai24a.html}, abstract = {Zeroth-order (ZO) methods, which use the finite difference of two function evaluations (also called ZO gradient) to approximate first-order gradient, have attracted much attention recently in machine learning because of their broad applications. The accuracy of the ZO gradient highly depends on how many finite differences are averaged, which are intrinsically determined by the number of perturbations randomly drawn from a distribution. Existing ZO methods try to learn a data-driven distribution for sampling the perturbations to improve the efficiency of ZO optimization (ZOO) algorithms. In this paper, we explore a new and parallel direction, \textit{i.e.}, learn an optimal sampling policy instead of using a totally random strategy to generate perturbations based on the techniques of reinforcement learning (RL), which makes it possible to approximate the gradient with only two function evaluations. Specifically, we first formulate the problem of learning a sampling policy as a Markov decision process. Then, we propose our ZO-RL algorithm, \textit{i.e.}, using deep deterministic policy gradient, an actor-critic RL algorithm to learn a sampling policy that can guide the generation of perturbed vectors in getting ZO gradients as accurately as possible. Importantly, the existing ZOO algorithms for learning a distribution can be plugged in to improve the exploration of ZO-RL. Experimental results with different ZO estimators show that our ZO-RL algorithm can effectively reduce the query complexity of ZOO algorithms and converge faster than existing ZOO algorithms, especially in the later stage of the optimization process.} }
Endnote
%0 Conference Paper %T Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization %A Zhou Zhai %A Wanli Shi %A Heng Huang %A Yi Chang %A Bin Gu %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-zhai24a %I PMLR %P 1162--1170 %U https://proceedings.mlr.press/v238/zhai24a.html %V 238 %X Zeroth-order (ZO) methods, which use the finite difference of two function evaluations (also called ZO gradient) to approximate first-order gradient, have attracted much attention recently in machine learning because of their broad applications. The accuracy of the ZO gradient highly depends on how many finite differences are averaged, which are intrinsically determined by the number of perturbations randomly drawn from a distribution. Existing ZO methods try to learn a data-driven distribution for sampling the perturbations to improve the efficiency of ZO optimization (ZOO) algorithms. In this paper, we explore a new and parallel direction, \textit{i.e.}, learn an optimal sampling policy instead of using a totally random strategy to generate perturbations based on the techniques of reinforcement learning (RL), which makes it possible to approximate the gradient with only two function evaluations. Specifically, we first formulate the problem of learning a sampling policy as a Markov decision process. Then, we propose our ZO-RL algorithm, \textit{i.e.}, using deep deterministic policy gradient, an actor-critic RL algorithm to learn a sampling policy that can guide the generation of perturbed vectors in getting ZO gradients as accurately as possible. Importantly, the existing ZOO algorithms for learning a distribution can be plugged in to improve the exploration of ZO-RL. Experimental results with different ZO estimators show that our ZO-RL algorithm can effectively reduce the query complexity of ZOO algorithms and converge faster than existing ZOO algorithms, especially in the later stage of the optimization process.
APA
Zhai, Z., Shi, W., Huang, H., Chang, Y. & Gu, B.. (2024). Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1162-1170 Available from https://proceedings.mlr.press/v238/zhai24a.html.

Related Material