An Interpretable and Sample Efficient Deep Kernel for Gaussian Process
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:759-768, 2020.
We propose a novel Gaussian process kernel that takes advantage of a deep neural network (DNN) structure but retains good interpretability. The resulting kernel is capable of addressing four major issues of the previous works of similar art, i.e., the optimality, explainability, model complexity, and sample efficiency. Our kernel design procedure comprises three steps: (1) Derivation of an optimal kernel with a non-stationary dot product structure that minimizes the prediction/test mean-squared-error (MSE); (2) Decomposition of this optimal kernel as a linear combination of shallow DNN subnetworks with the aid of multi-way feature interaction detection; (3) Updating the hyper-parameters of the subnetworks via an alternating rationale until convergence. The designed kernel does not sacrifice interpretability for optimality. On the contrary, each subnetwork explicitly demonstrates the interaction of a set of features in a transformation function, leading to a solid path toward explainable kernel learning. We test the proposed kernel with both synthesized and real-world data sets, and the proposed kernel is superior to its competitors in terms of prediction performance in most cases. Moreover, it tends to maintain the prediction performance and be robust to data over-fitting issue, when reducing the number of samples.