A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

Arash Afkanpour; András György; Csaba Szepesvari; Michael Bowling

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

Arash Afkanpour, András György, Csaba Szepesvari, Michael Bowling

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):374-382, 2013.

Abstract

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows sampling from this distribution in O(\log(d)) time, making the total computational cost of the method to achieve an epsilon-optimal solution to be O(\log(d)/epsilon^2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-afkanpour13,
  title = 	 {A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning},
  author = 	 {Afkanpour, Arash and György, András and Szepesvari, Csaba and Bowling, Michael},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {374--382},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/afkanpour13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/afkanpour13.html},
  abstract = 	 {We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows sampling from this distribution in O(\log(d)) time, making the total computational cost of the method to achieve an epsilon-optimal solution to be O(\log(d)/epsilon^2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.}
}

Endnote

%0 Conference Paper
%T A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
%A Arash Afkanpour
%A András György
%A Csaba Szepesvari
%A Michael Bowling
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-afkanpour13
%I PMLR
%P 374--382
%U https://proceedings.mlr.press/v28/afkanpour13.html
%V 28
%N 1
%X We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows sampling from this distribution in O(\log(d)) time, making the total computational cost of the method to achieve an epsilon-optimal solution to be O(\log(d)/epsilon^2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

RIS


TY  - CPAPER
TI  - A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
AU  - Arash Afkanpour
AU  - András György
AU  - Csaba Szepesvari
AU  - Michael Bowling
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/02/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-afkanpour13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 1
SP  - 374
EP  - 382
L1  - http://proceedings.mlr.press/v28/afkanpour13.pdf
UR  - https://proceedings.mlr.press/v28/afkanpour13.html
AB  - We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows sampling from this distribution in O(\log(d)) time, making the total computational cost of the method to achieve an epsilon-optimal solution to be O(\log(d)/epsilon^2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.
ER  -

APA


Afkanpour, A., György, A., Szepesvari, C. & Bowling, M.. (2013). A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(1):374-382 Available from https://proceedings.mlr.press/v28/afkanpour13.html.

Related Material

Download PDF