[edit]
Conic Activation Functions
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:296-309, 2024.
Abstract
Most activation functions operate component-wise, which restricts the equivariance of neural networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the symmetry of neural networks to continuous orthogonal groups. By interpreting ReLU as a projection onto its invariant set-the positive orthant-we propose a conic activation function that uses a Lorentz cone instead. Its performance can be further improved by considering multi-head structures, soft scaling, and axis sharing. CoLU associated with low-dimensional cones outperforms the component-wise ReLU in a wide range of models-including MLP, ResNet, and UNet, etc., achieving better loss values and faster convergence. It significantly improves diffusion models’ training and performance. CoLU originates from a first-principles approach to various forms of neural networks and fundamentally changes their algebraic structure.