Characterizing the implicit bias via a primal-dual analysis

Ziwei Ji; Matus Telgarsky

Characterizing the implicit bias via a primal-dual analysis

Ziwei Ji, Matus Telgarsky

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:772-804, 2021.

Abstract

This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with $n$ training examples and $t$ gradient descent steps, our dual analysis further allows us to prove an $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin direction, when a constant step size is used. This rate is tight in both $n$ and $t$, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-ji21a,
  title = 	 {Characterizing the implicit bias via a primal-dual analysis},
  author =       {Ji, Ziwei and Telgarsky, Matus},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {772--804},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/ji21a/ji21a.pdf},
  url = 	 {https://proceedings.mlr.press/v132/ji21a.html},
  abstract = 	 { This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with $n$ training examples and $t$ gradient descent steps, our dual analysis further allows us to prove an $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin direction, when a constant step size is used. This rate is tight in both $n$ and $t$, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate. }
}

Endnote

%0 Conference Paper
%T Characterizing the implicit bias via a primal-dual analysis
%A Ziwei Ji
%A Matus Telgarsky
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-ji21a
%I PMLR
%P 772--804
%U https://proceedings.mlr.press/v132/ji21a.html
%V 132
%X  This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with $n$ training examples and $t$ gradient descent steps, our dual analysis further allows us to prove an $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin direction, when a constant step size is used. This rate is tight in both $n$ and $t$, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.

APA


Ji, Z. & Telgarsky, M.. (2021). Characterizing the implicit bias via a primal-dual analysis. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:772-804 Available from https://proceedings.mlr.press/v132/ji21a.html.

Related Material

Download PDF