Conic Activation Functions

Changqing Fu; Laurent D. Cohen

Conic Activation Functions

Changqing Fu, Laurent D. Cohen

Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:296-309, 2024.

Abstract

Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.

Cite this Paper

BibTeX

@InProceedings{pmlr-v285-fu24a,
  title = 	 {Conic Activation Functions},
  author =       {Fu, Changqing and Cohen, Laurent D.},
  booktitle = 	 {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models},
  pages = 	 {296--309},
  year = 	 {2024},
  editor = 	 {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly},
  volume = 	 {285},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v285/main/assets/fu24a/fu24a.pdf},
  url = 	 {https://proceedings.mlr.press/v285/fu24a.html},
  abstract = 	 {Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.}
}

Endnote

%0 Conference Paper
%T Conic Activation Functions
%A Changqing Fu
%A Laurent D. Cohen
%B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models
%C Proceedings of Machine Learning Research
%D 2024
%E Marco Fumero
%E Clementine Domine
%E Zorah Lähner
%E Donato Crisostomi
%E Luca Moschella
%E Kimberly Stachenfeld	
%F pmlr-v285-fu24a
%I PMLR
%P 296--309
%U https://proceedings.mlr.press/v285/fu24a.html
%V 285
%X Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.

APA

Fu, C. & Cohen, L.D.. (2024). Conic Activation Functions. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:296-309 Available from https://proceedings.mlr.press/v285/fu24a.html.

Conic Activation Functions

Abstract

Cite this Paper

Related Material