Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks

Maricela Best Mckay; Avleen Kaur; Chen Greif; Brian Wetton

Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks

Maricela Best Mckay, Avleen Kaur, Chen Greif, Brian Wetton

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:4005-4019, 2025.

Abstract

Natural gradient methods for PINNs have achieved state-of-the-art performance with errors several orders of magnitude smaller than those achieved by standard optimizers such as ADAM or L-BFGS. However, computing natural gradients for PINNs is prohibitively computationally costly and memory-intensive for all but small neural network architectures. We develop a randomized algorithm for natural gradient descent for PINNs that uses sketching to approximate the natural gradient descent direction. We prove that the change of coordinate Gram matrix used in a natural gradient descent update has rapidly-decaying eigenvalues for a one-layer, one-dimensional neural network and empirically demonstrate that this structure holds for four different example problems. Under this structure, our sketching algorithm is guaranteed to provide a near-optimal low-rank approximation of the Gramian. Our algorithm dramatically speeds up computation time and reduces memory overhead. Additionally, in our experiments, the sketched natural gradient outperforms the original natural gradient in terms of accuracy, often achieving an error that is an order of magnitude smaller. Training time for a network with around 5,000 parameters is reduced from several hours to under two minutes. Training can be practically scaled to large network sizes; we optimize a PINN for a network with over a million parameters within a few minutes, a task for which the full Gram matrix does not fit in memory.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-best-mckay25a,
  title = 	 {Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks},
  author =       {Best Mckay, Maricela and Kaur, Avleen and Greif, Chen and Wetton, Brian},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {4005--4019},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/best-mckay25a/best-mckay25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/best-mckay25a.html},
  abstract = 	 {Natural gradient methods for PINNs have achieved state-of-the-art performance with errors several orders of magnitude smaller than those achieved by standard optimizers such as ADAM or L-BFGS. However, computing natural gradients for PINNs is prohibitively computationally costly and memory-intensive for all but small neural network architectures. We develop a randomized algorithm for natural gradient descent for PINNs that uses sketching to approximate the natural gradient descent direction. We prove that the change of coordinate Gram matrix used in a natural gradient descent update has rapidly-decaying eigenvalues for a one-layer, one-dimensional neural network and empirically demonstrate that this structure holds for four different example problems. Under this structure, our sketching algorithm is guaranteed to provide a near-optimal low-rank approximation of the Gramian. Our algorithm dramatically speeds up computation time and reduces memory overhead. Additionally, in our experiments, the sketched natural gradient outperforms the original natural gradient in terms of accuracy, often achieving an error that is an order of magnitude smaller. Training time for a network with around 5,000 parameters is reduced from several hours to under two minutes. Training can be practically scaled to large network sizes; we optimize a PINN for a network with over a million parameters within a few minutes, a task for which the full Gram matrix does not fit in memory.}
}

Endnote

%0 Conference Paper
%T Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks
%A Maricela Best Mckay
%A Avleen Kaur
%A Chen Greif
%A Brian Wetton
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-best-mckay25a
%I PMLR
%P 4005--4019
%U https://proceedings.mlr.press/v267/best-mckay25a.html
%V 267
%X Natural gradient methods for PINNs have achieved state-of-the-art performance with errors several orders of magnitude smaller than those achieved by standard optimizers such as ADAM or L-BFGS. However, computing natural gradients for PINNs is prohibitively computationally costly and memory-intensive for all but small neural network architectures. We develop a randomized algorithm for natural gradient descent for PINNs that uses sketching to approximate the natural gradient descent direction. We prove that the change of coordinate Gram matrix used in a natural gradient descent update has rapidly-decaying eigenvalues for a one-layer, one-dimensional neural network and empirically demonstrate that this structure holds for four different example problems. Under this structure, our sketching algorithm is guaranteed to provide a near-optimal low-rank approximation of the Gramian. Our algorithm dramatically speeds up computation time and reduces memory overhead. Additionally, in our experiments, the sketched natural gradient outperforms the original natural gradient in terms of accuracy, often achieving an error that is an order of magnitude smaller. Training time for a network with around 5,000 parameters is reduced from several hours to under two minutes. Training can be practically scaled to large network sizes; we optimize a PINN for a network with over a million parameters within a few minutes, a task for which the full Gram matrix does not fit in memory.

APA

Best Mckay, M., Kaur, A., Greif, C. & Wetton, B.. (2025). Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:4005-4019 Available from https://proceedings.mlr.press/v267/best-mckay25a.html.

Near-optimal Sketchy Natural Gradients for Physics-Informed Neural Networks

Abstract

Cite this Paper

Related Material