A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

Zhize Li, Jian Li
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1047-1057, 2020.

Abstract

\emph{Anderson acceleration} (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alphaabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth & Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a \emph{guessing algorithm} for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov’s Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-li20d, title = {A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization}, author = {Li, Zhize and Li, Jian}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {1047--1057}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/li20d/li20d.pdf}, url = { http://proceedings.mlr.press/v108/li20d.html }, abstract = {\emph{Anderson acceleration} (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alphaabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth & Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a \emph{guessing algorithm} for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov’s Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.} }
Endnote
%0 Conference Paper %T A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization %A Zhize Li %A Jian Li %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-li20d %I PMLR %P 1047--1057 %U http://proceedings.mlr.press/v108/li20d.html %V 108 %X \emph{Anderson acceleration} (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-\alphaabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth & Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a \emph{guessing algorithm} for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov’s Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.
APA
Li, Z. & Li, J.. (2020). A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1047-1057 Available from http://proceedings.mlr.press/v108/li20d.html .

Related Material