Safe online convex optimization with multi-point feedback

Spencer Hutchinson; Mahnoosh Alizadeh

Safe online convex optimization with multi-point feedback

Spencer Hutchinson, Mahnoosh Alizadeh

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:168-180, 2024.

Abstract

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $O(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v242-hutchinson24a,
  title = 	 {Safe online convex optimization with multi-point feedback},
  author =       {Hutchinson, Spencer and Alizadeh, Mahnoosh},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {168--180},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/hutchinson24a/hutchinson24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/hutchinson24a.html},
  abstract = 	 {Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $O(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.}
}

Endnote

%0 Conference Paper
%T Safe online convex optimization with multi-point feedback
%A Spencer Hutchinson
%A Mahnoosh Alizadeh
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-hutchinson24a
%I PMLR
%P 168--180
%U https://proceedings.mlr.press/v242/hutchinson24a.html
%V 242
%X Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $O(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

APA


Hutchinson, S. & Alizadeh, M.. (2024). Safe online convex optimization with multi-point feedback. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:168-180 Available from https://proceedings.mlr.press/v242/hutchinson24a.html.

Related Material

Download PDF