Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings

Rong-Xi Tan, Ming Chen, Ke Xue, Yao Wang, Yaoyuan Wang, Fu Sheng, Chao Qian
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:58499-58544, 2025.

Abstract

The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-tan25b, title = {Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings}, author = {Tan, Rong-Xi and Chen, Ming and Xue, Ke and Wang, Yao and Wang, Yaoyuan and Sheng, Fu and Qian, Chao}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {58499--58544}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/tan25b/tan25b.pdf}, url = {https://proceedings.mlr.press/v267/tan25b.html}, abstract = {The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.} }
Endnote
%0 Conference Paper %T Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings %A Rong-Xi Tan %A Ming Chen %A Ke Xue %A Yao Wang %A Yaoyuan Wang %A Fu Sheng %A Chao Qian %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-tan25b %I PMLR %P 58499--58544 %U https://proceedings.mlr.press/v267/tan25b.html %V 267 %X The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.
APA
Tan, R., Chen, M., Xue, K., Wang, Y., Wang, Y., Sheng, F. & Qian, C.. (2025). Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:58499-58544 Available from https://proceedings.mlr.press/v267/tan25b.html.

Related Material