Pessimistic Off-Policy Multi-Objective Optimization

Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2980-2988, 2024.

Abstract

Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-alizadeh24a, title = {Pessimistic Off-Policy Multi-Objective Optimization}, author = {Alizadeh, Shima and Bhargava, Aniruddha and Gopalswamy, Karthick and Jain, Lalit and Kveton, Branislav and Liu, Ge}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2980--2988}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/alizadeh24a/alizadeh24a.pdf}, url = {https://proceedings.mlr.press/v238/alizadeh24a.html}, abstract = {Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.} }
Endnote
%0 Conference Paper %T Pessimistic Off-Policy Multi-Objective Optimization %A Shima Alizadeh %A Aniruddha Bhargava %A Karthick Gopalswamy %A Lalit Jain %A Branislav Kveton %A Ge Liu %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-alizadeh24a %I PMLR %P 2980--2988 %U https://proceedings.mlr.press/v238/alizadeh24a.html %V 238 %X Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.
APA
Alizadeh, S., Bhargava, A., Gopalswamy, K., Jain, L., Kveton, B. & Liu, G.. (2024). Pessimistic Off-Policy Multi-Objective Optimization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2980-2988 Available from https://proceedings.mlr.press/v238/alizadeh24a.html.

Related Material