Kernel Learning by Unconstrained Optimization
; Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:328-335, 2009.
We study the problem of learning a kernel matrix from an apriori kernel and training data. In this context, we propose a new unconstrained convex optimization formulation, with an arbitrary convex second-order differentiable loss function on kernel entries and a LogDet divergence term for regularization. Since the number of variables is of order $O(n^2)$, the computational cost of standard Newton and quasi-Newton methods for learning a kernel matrix is prohibitive. Here an operator form Hessian is used to develop an $O(n^3)$ trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed.