[edit]
Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21520-21547, 2022.
Abstract
Coreset (or core-set) is a small weighted subset Q of an input set P with respect to a given monotonic function f:R→R that provably approximates its fitting loss ∑p∈Pf(p⋅x) to any given x∈Rd. Using Q we can obtain an approximation of x∗ that minimizes this loss, by running existing optimization algorithms on Q. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than n=|P| for general monotonic loss functions. (ii) A proof that, with an additional common regularization term and under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for any input P. (iii) A generic coreset construction algorithm that computes such a small coreset Q in O(nd+nlogn) time, and (iv) Experimental results with open-source code which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.