Large-Scale Data-Dependent Kernel Approximation
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:19-27, 2017.
Learning a computationally efficient kernel from data is an important machine learning problem. The majority of kernels in the literature do not leverage the geometry of the data, and those that do are computationally infeasible for contemporary datasets. Recent advances in approximation techniques have expanded the applicability of the kernel methodology to scale linearly with the data size. Data-dependent kernels, which could leverage this computational advantage, have however not yet seen the benefit. Here we derive an approximate large-scale learning procedure for data-dependent kernels that is efficient and performs well in practice. We provide a Lemma that can be used to derive the asymptotic convergence of the approximation in the limit of infinite random features, and, under certain conditions, an estimate of the convergence speed. We empirically prove that our construction represents a valid, yet efficient approximation of the data-dependent kernel. For large-scale datasets of millions of datapoints, where the proposed method is now applicable for the first time, we notice a significant performance boost over both baselines consisting of data independent kernels and of kernel approximations, at comparable computational cost.