Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning

Rafał Surdej; Michał Bortkiewicz; Alex Lewandowski; Mateusz Ostaszewski; Clare Lyle

Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning

Rafał Surdej, Michał Bortkiewicz, Alex Lewandowski, Mateusz Ostaszewski, Clare Lyle

Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:64-88, 2026.

Abstract

Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Code available at: https://github.com/special114/rl_rational_plasticity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v330-surdej26a,
  title = 	 {Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning},
  author =       {Surdej, Rafa{\l} and Bortkiewicz, Micha{\l} and Lewandowski, Alex and Ostaszewski, Mateusz and Lyle, Clare},
  booktitle = 	 {Proceedings of The 4th Conference on Lifelong Learning Agents},
  pages = 	 {64--88},
  year = 	 {2026},
  editor = 	 {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal},
  volume = 	 {330},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v330/main/assets/surdej26a/surdej26a.pdf},
  url = 	 {https://proceedings.mlr.press/v330/surdej26a.html},
  abstract = 	 {Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Code available at: https://github.com/special114/rl_rational_plasticity.}
}

Endnote

%0 Conference Paper
%T Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning
%A Rafał Surdej
%A Michał Bortkiewicz
%A Alex Lewandowski
%A Mateusz Ostaszewski
%A Clare Lyle
%B Proceedings of The 4th Conference on Lifelong Learning Agents
%C Proceedings of Machine Learning Research
%D 2026
%E Sarath Chandar
%E Razvan Pascanu
%E Eric Eaton
%E Bing Liu
%E Rupam Mahmood
%E Amal Rannen-Triki	
%F pmlr-v330-surdej26a
%I PMLR
%P 64--88
%U https://proceedings.mlr.press/v330/surdej26a.html
%V 330
%X Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Code available at: https://github.com/special114/rl_rational_plasticity.

APA

Surdej, R., Bortkiewicz, M., Lewandowski, A., Ostaszewski, M. & Lyle, C.. (2026). Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:64-88 Available from https://proceedings.mlr.press/v330/surdej26a.html.

Related Material

Download PDF