Weighted Euclidean Distance Matrices over Mixed Continuous and Categorical Inputs for Gaussian Process Models

Mingyu Pu, Wang Songhao, Haowei Wang, Szu Hui Ng
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4951-4959, 2025.

Abstract

Gaussian Process (GP) models are widely utilized as surrogate models in scientific and engineering fields. However, standard GP models are limited to continuous variables due to the difficulties in establishing correlation structures for categorical variables. To overcome this limitation, we introduce \textbf{WE}ighted Euclidean distance matrices \textbf{G}aussian \textbf{P}rocess (WEGP). WEGP constructs the kernel function for each categorical input by estimating the Euclidean distance matrix (EDM) among all categorical choices of this input. The EDM is represented as a linear combination of several predefined base EDMs, each scaled by a positive weight. The weights, along with other kernel hyperparameters, are inferred using a fully Bayesian framework. We analyze the predictive performance of WEGP theoretically. Numerical experiments validate the accuracy of our GP model, and by WEGP, into Bayesian Optimization (BO), we achieve superior performance on both synthetic and real-world optimization problems. The code is available at: \url{https://github.com/pmy0124nus/WEGP.}

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-pu25a, title = {Weighted Euclidean Distance Matrices over Mixed Continuous and Categorical Inputs for Gaussian Process Models}, author = {Pu, Mingyu and Songhao, Wang and Wang, Haowei and Ng, Szu Hui}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4951--4959}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/pu25a/pu25a.pdf}, url = {https://proceedings.mlr.press/v258/pu25a.html}, abstract = {Gaussian Process (GP) models are widely utilized as surrogate models in scientific and engineering fields. However, standard GP models are limited to continuous variables due to the difficulties in establishing correlation structures for categorical variables. To overcome this limitation, we introduce \textbf{WE}ighted Euclidean distance matrices \textbf{G}aussian \textbf{P}rocess (WEGP). WEGP constructs the kernel function for each categorical input by estimating the Euclidean distance matrix (EDM) among all categorical choices of this input. The EDM is represented as a linear combination of several predefined base EDMs, each scaled by a positive weight. The weights, along with other kernel hyperparameters, are inferred using a fully Bayesian framework. We analyze the predictive performance of WEGP theoretically. Numerical experiments validate the accuracy of our GP model, and by WEGP, into Bayesian Optimization (BO), we achieve superior performance on both synthetic and real-world optimization problems. The code is available at: \url{https://github.com/pmy0124nus/WEGP.}} }
Endnote
%0 Conference Paper %T Weighted Euclidean Distance Matrices over Mixed Continuous and Categorical Inputs for Gaussian Process Models %A Mingyu Pu %A Wang Songhao %A Haowei Wang %A Szu Hui Ng %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-pu25a %I PMLR %P 4951--4959 %U https://proceedings.mlr.press/v258/pu25a.html %V 258 %X Gaussian Process (GP) models are widely utilized as surrogate models in scientific and engineering fields. However, standard GP models are limited to continuous variables due to the difficulties in establishing correlation structures for categorical variables. To overcome this limitation, we introduce \textbf{WE}ighted Euclidean distance matrices \textbf{G}aussian \textbf{P}rocess (WEGP). WEGP constructs the kernel function for each categorical input by estimating the Euclidean distance matrix (EDM) among all categorical choices of this input. The EDM is represented as a linear combination of several predefined base EDMs, each scaled by a positive weight. The weights, along with other kernel hyperparameters, are inferred using a fully Bayesian framework. We analyze the predictive performance of WEGP theoretically. Numerical experiments validate the accuracy of our GP model, and by WEGP, into Bayesian Optimization (BO), we achieve superior performance on both synthetic and real-world optimization problems. The code is available at: \url{https://github.com/pmy0124nus/WEGP.}
APA
Pu, M., Songhao, W., Wang, H. & Ng, S.H.. (2025). Weighted Euclidean Distance Matrices over Mixed Continuous and Categorical Inputs for Gaussian Process Models. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4951-4959 Available from https://proceedings.mlr.press/v258/pu25a.html.

Related Material