Optimization for Neural Operators can Benefit from Width

Pedro Cisneros-Velarde; Bhavesh Shrimali; Arindam Banerjee

Optimization for Neural Operators can Benefit from Width

Pedro Cisneros-Velarde, Bhavesh Shrimali, Arindam Banerjee

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:10994-11041, 2025.

Abstract

Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-cisneros-velarde25a,
  title = 	 {Optimization for Neural Operators can Benefit from Width},
  author =       {Cisneros-Velarde, Pedro and Shrimali, Bhavesh and Banerjee, Arindam},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {10994--11041},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cisneros-velarde25a/cisneros-velarde25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/cisneros-velarde25a.html},
  abstract = 	 {Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.}
}

Endnote

%0 Conference Paper
%T Optimization for Neural Operators can Benefit from Width
%A Pedro Cisneros-Velarde
%A Bhavesh Shrimali
%A Arindam Banerjee
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-cisneros-velarde25a
%I PMLR
%P 10994--11041
%U https://proceedings.mlr.press/v267/cisneros-velarde25a.html
%V 267
%X Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.

APA

Cisneros-Velarde, P., Shrimali, B. & Banerjee, A.. (2025). Optimization for Neural Operators can Benefit from Width. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:10994-11041 Available from https://proceedings.mlr.press/v267/cisneros-velarde25a.html.

Optimization for Neural Operators can Benefit from Width

Abstract

Cite this Paper

Related Material