A Bayesian Approach for Stochastic Continuum-armed Bandit with Long-term Constraints
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:8370-8391, 2022.
Despite many valuable advances in the domain of online convex optimization over the last decade, many machine learning and networking problems of interest do not fit into that framework due to their nonconvex objectives and the presence of constraints. This motivates us in this paper to go beyond convexity and study the problem of stochastic continuum-armed bandit with long-term constraints. For noiseless observations of constraint functions, we propose a generic method using a Bayesian approach based on a class of penalty functions, and prove that it can achieve a sublinear regret with respect to the global optimum and a sublinear constraint violation (CV), which can match the best results of previous methods. Additionally, we propose another method to deal with the case where constraint functions are observed with noise, which can achieve a sublinear regret and a sublinear CV with more assumptions. Finally, we use two experiments to compare our methods with two benchmark methods in online optimization and Bayesian optimization, which demonstrates the advantages of our algorithms.