Kernel Learning by Unconstrained Optimization

Fuxin Li, Yunshan Fu, Yu-Hong Dai, Cristian Sminchisescu, Jue Wang
; Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:328-335, 2009.

Abstract

We study the problem of learning a kernel matrix from an apriori kernel and training data. In this context, we propose a new unconstrained convex optimization formulation, with an arbitrary convex second-order differentiable loss function on kernel entries and a LogDet divergence term for regularization. Since the number of variables is of order $O(n^2)$, the computational cost of standard Newton and quasi-Newton methods for learning a kernel matrix is prohibitive. Here an operator form Hessian is used to develop an $O(n^3)$ trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v5-li09a, title = {Kernel Learning by Unconstrained Optimization}, author = {Fuxin Li and Yunshan Fu and Yu-Hong Dai and Cristian Sminchisescu and Jue Wang}, booktitle = {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics}, pages = {328--335}, year = {2009}, editor = {David van Dyk and Max Welling}, volume = {5}, series = {Proceedings of Machine Learning Research}, address = {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v5/li09a/li09a.pdf}, url = {http://proceedings.mlr.press/v5/li09a.html}, abstract = {We study the problem of learning a kernel matrix from an apriori kernel and training data. In this context, we propose a new unconstrained convex optimization formulation, with an arbitrary convex second-order differentiable loss function on kernel entries and a LogDet divergence term for regularization. Since the number of variables is of order $O(n^2)$, the computational cost of standard Newton and quasi-Newton methods for learning a kernel matrix is prohibitive. Here an operator form Hessian is used to develop an $O(n^3)$ trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed.} }
Endnote
%0 Conference Paper %T Kernel Learning by Unconstrained Optimization %A Fuxin Li %A Yunshan Fu %A Yu-Hong Dai %A Cristian Sminchisescu %A Jue Wang %B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2009 %E David van Dyk %E Max Welling %F pmlr-v5-li09a %I PMLR %J Proceedings of Machine Learning Research %P 328--335 %U http://proceedings.mlr.press %V 5 %W PMLR %X We study the problem of learning a kernel matrix from an apriori kernel and training data. In this context, we propose a new unconstrained convex optimization formulation, with an arbitrary convex second-order differentiable loss function on kernel entries and a LogDet divergence term for regularization. Since the number of variables is of order $O(n^2)$, the computational cost of standard Newton and quasi-Newton methods for learning a kernel matrix is prohibitive. Here an operator form Hessian is used to develop an $O(n^3)$ trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed.
RIS
TY - CPAPER TI - Kernel Learning by Unconstrained Optimization AU - Fuxin Li AU - Yunshan Fu AU - Yu-Hong Dai AU - Cristian Sminchisescu AU - Jue Wang BT - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics PY - 2009/04/15 DA - 2009/04/15 ED - David van Dyk ED - Max Welling ID - pmlr-v5-li09a PB - PMLR SP - 328 DP - PMLR EP - 335 L1 - http://proceedings.mlr.press/v5/li09a/li09a.pdf UR - http://proceedings.mlr.press/v5/li09a.html AB - We study the problem of learning a kernel matrix from an apriori kernel and training data. In this context, we propose a new unconstrained convex optimization formulation, with an arbitrary convex second-order differentiable loss function on kernel entries and a LogDet divergence term for regularization. Since the number of variables is of order $O(n^2)$, the computational cost of standard Newton and quasi-Newton methods for learning a kernel matrix is prohibitive. Here an operator form Hessian is used to develop an $O(n^3)$ trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed. ER -
APA
Li, F., Fu, Y., Dai, Y., Sminchisescu, C. & Wang, J.. (2009). Kernel Learning by Unconstrained Optimization. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in PMLR 5:328-335

Related Material