Hierarchical partition of unity networks: fast multilevel training

Nathaniel Trask, Amelia Henriksen, Carianne Martinez, Eric Cyr
Proceedings of Mathematical and Scientific Machine Learning, PMLR 190:271-286, 2022.

Abstract

We present a probabilistic mixture of experts framework to perform nonparametric piecewise polynomial approximation without the need for an underlying mesh partitioning space. Deep neural networks traditionally used for classification provide a means of localizing polynomial approximation, and the probabilistic formulation admits a trivially parallelizable expectation maximization (EM) strategy. We then introduce a hierarchical architecture whose EM loss naturally decomposes into coarse and fine scale terms and small decoupled least squares problems. We exploit this hierarchical structure to formulate a V-cycle multigrid-inspired training algorithm. A suite of benchmarks demonstrate the ability of the scheme to: realize for smooth data algebraic convergence with respect to number of partitions, exponential convergence with respect to polynomial order; exactly reproduce piecewise polynomial functions; and demonstrate through an application to data-driven semiconductor modeling the ability to accurately treat data spanning several orders of magnitude.

Cite this Paper


BibTeX
@InProceedings{pmlr-v190-trask22a, title = {Hierarchical partition of unity networks: fast multilevel training}, author = {Trask, Nathaniel and Henriksen, Amelia and Martinez, Carianne and Cyr, Eric}, booktitle = {Proceedings of Mathematical and Scientific Machine Learning}, pages = {271--286}, year = {2022}, editor = {Dong, Bin and Li, Qianxiao and Wang, Lei and Xu, Zhi-Qin John}, volume = {190}, series = {Proceedings of Machine Learning Research}, month = {15--17 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v190/trask22a/trask22a.pdf}, url = {https://proceedings.mlr.press/v190/trask22a.html}, abstract = {We present a probabilistic mixture of experts framework to perform nonparametric piecewise polynomial approximation without the need for an underlying mesh partitioning space. Deep neural networks traditionally used for classification provide a means of localizing polynomial approximation, and the probabilistic formulation admits a trivially parallelizable expectation maximization (EM) strategy. We then introduce a hierarchical architecture whose EM loss naturally decomposes into coarse and fine scale terms and small decoupled least squares problems. We exploit this hierarchical structure to formulate a V-cycle multigrid-inspired training algorithm. A suite of benchmarks demonstrate the ability of the scheme to: realize for smooth data algebraic convergence with respect to number of partitions, exponential convergence with respect to polynomial order; exactly reproduce piecewise polynomial functions; and demonstrate through an application to data-driven semiconductor modeling the ability to accurately treat data spanning several orders of magnitude.} }
Endnote
%0 Conference Paper %T Hierarchical partition of unity networks: fast multilevel training %A Nathaniel Trask %A Amelia Henriksen %A Carianne Martinez %A Eric Cyr %B Proceedings of Mathematical and Scientific Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Bin Dong %E Qianxiao Li %E Lei Wang %E Zhi-Qin John Xu %F pmlr-v190-trask22a %I PMLR %P 271--286 %U https://proceedings.mlr.press/v190/trask22a.html %V 190 %X We present a probabilistic mixture of experts framework to perform nonparametric piecewise polynomial approximation without the need for an underlying mesh partitioning space. Deep neural networks traditionally used for classification provide a means of localizing polynomial approximation, and the probabilistic formulation admits a trivially parallelizable expectation maximization (EM) strategy. We then introduce a hierarchical architecture whose EM loss naturally decomposes into coarse and fine scale terms and small decoupled least squares problems. We exploit this hierarchical structure to formulate a V-cycle multigrid-inspired training algorithm. A suite of benchmarks demonstrate the ability of the scheme to: realize for smooth data algebraic convergence with respect to number of partitions, exponential convergence with respect to polynomial order; exactly reproduce piecewise polynomial functions; and demonstrate through an application to data-driven semiconductor modeling the ability to accurately treat data spanning several orders of magnitude.
APA
Trask, N., Henriksen, A., Martinez, C. & Cyr, E.. (2022). Hierarchical partition of unity networks: fast multilevel training. Proceedings of Mathematical and Scientific Machine Learning, in Proceedings of Machine Learning Research 190:271-286 Available from https://proceedings.mlr.press/v190/trask22a.html.

Related Material