Conic Activation Functions

Changqing Fu, Laurent D. Cohen
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:296-309, 2024.

Abstract

Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.

Cite this Paper


BibTeX
@InProceedings{pmlr-v285-fu24a, title = {Conic Activation Functions}, author = {Fu, Changqing and Cohen, Laurent D.}, booktitle = {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models}, pages = {296--309}, year = {2024}, editor = {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly}, volume = {285}, series = {Proceedings of Machine Learning Research}, month = {14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v285/main/assets/fu24a/fu24a.pdf}, url = {https://proceedings.mlr.press/v285/fu24a.html}, abstract = {Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.} }
Endnote
%0 Conference Paper %T Conic Activation Functions %A Changqing Fu %A Laurent D. Cohen %B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Clementine Domine %E Zorah Lähner %E Donato Crisostomi %E Luca Moschella %E Kimberly Stachenfeld %F pmlr-v285-fu24a %I PMLR %P 296--309 %U https://proceedings.mlr.press/v285/fu24a.html %V 285 %X Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.
APA
Fu, C. & Cohen, L.D.. (2024). Conic Activation Functions. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:296-309 Available from https://proceedings.mlr.press/v285/fu24a.html.

Related Material