Twotemperature logistic regression based on the Tsallis divergence
[edit]
Proceedings of Machine Learning Research, PMLR 89:23882396, 2019.
Abstract
We develop a variant of multiclass logistic regression that is significantly more robust to noise. The algorithm has one weight vector per class and the surrogate loss is a function of the linear activations (one per class). The surrogate loss of an example with linear activation vector $\mathbf{a}$ and class $c$ has the form $\log_{t_1} \exp_{t_2} (a_c  G_{t_2}(\mathbf{a}))$ where the two temperatures $t_1$ and $t_2$ “temper” the $\log$ and $\exp$, respectively, and $G_{t_2}(\mathbf{a})$ is a scalar value that generalizes the logpartition function. We motivate this loss using the Tsallis divergence. Our method allows transitioning between nonconvex and convex losses by the choice of the temperature parameters. As the temperature $t_1$ of the logarithm becomes smaller than the temperature $t_2$ of the exponential, the surrogate loss becomes “quasi convex”. Various tunings of the temperatures recover previous methods and tuning the degree of nonconvexity is crucial in the experiments. In particular, quasiconvexity and boundedness of the loss provide significant robustness to the outliers. We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail. We show that the surrogate loss is Bayesconsistent, even in the nonconvex case. Additionally, we provide efficient iterative algorithms for calculating the logpartition value only in a few number of iterations. Our compelling experimental results on large realworld datasets show the advantage of using the twotemperature variant in the noisy as well as the noise free case.
Related Material


