High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

Yuxuan Yin, Yu Wang, Peng Li
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:57085-57100, 2024.

Abstract

We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization (TSBO), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. TSBO incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit TSBO, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. TSBO demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets. The implementation is available at https://github.com/reminiscenty/TSBO-Official.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yin24d, title = {High-Dimensional {B}ayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling}, author = {Yin, Yuxuan and Wang, Yu and Li, Peng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {57085--57100}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yin24d/yin24d.pdf}, url = {https://proceedings.mlr.press/v235/yin24d.html}, abstract = {We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization ($\texttt{TSBO}$), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. $\texttt{TSBO}$ incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit $\texttt{TSBO}$, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. $\texttt{TSBO}$ demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets. The implementation is available at https://github.com/reminiscenty/TSBO-Official.} }
Endnote
%0 Conference Paper %T High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling %A Yuxuan Yin %A Yu Wang %A Peng Li %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yin24d %I PMLR %P 57085--57100 %U https://proceedings.mlr.press/v235/yin24d.html %V 235 %X We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization ($\texttt{TSBO}$), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. $\texttt{TSBO}$ incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit $\texttt{TSBO}$, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. $\texttt{TSBO}$ demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets. The implementation is available at https://github.com/reminiscenty/TSBO-Official.
APA
Yin, Y., Wang, Y. & Li, P.. (2024). High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:57085-57100 Available from https://proceedings.mlr.press/v235/yin24d.html.

Related Material