A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks

Uday Kiran Reddy Tadipatri, Benjamin David Haeffele, Joshua Agterberg, Rene Vidal
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:5239-5247, 2025.

Abstract

We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks–a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-tadipatri25a, title = {A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks}, author = {Tadipatri, Uday Kiran Reddy and Haeffele, Benjamin David and Agterberg, Joshua and Vidal, Rene}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {5239--5247}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/tadipatri25a/tadipatri25a.pdf}, url = {https://proceedings.mlr.press/v258/tadipatri25a.html}, abstract = {We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks–a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.} }
Endnote
%0 Conference Paper %T A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks %A Uday Kiran Reddy Tadipatri %A Benjamin David Haeffele %A Joshua Agterberg %A Rene Vidal %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-tadipatri25a %I PMLR %P 5239--5247 %U https://proceedings.mlr.press/v258/tadipatri25a.html %V 258 %X We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks–a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.
APA
Tadipatri, U.K.R., Haeffele, B.D., Agterberg, J. & Vidal, R.. (2025). A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:5239-5247 Available from https://proceedings.mlr.press/v258/tadipatri25a.html.

Related Material