PICAB: A Permutation-Invariant Contextual Attention Bandit for Energy-Constrained Edge AI

Sayed Saminur Rahman, Yaser Al Mtawa, Victor Balogun
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:236-247, 2026.

Abstract

The deployment of Deep Neural Networks (DNNs) on resource-constrained edge devices presents a fundamental challenge: high-accuracy models often exceed the compute and energy budgets of local hardware, while full cloud offloading incurs unpredictable network latency. Inference splitting has emerged as a promising solution to this trade-off, enabling a DNN to be partitioned layer-wise across the edge-cloud continuum. However, optimizing these split decisions is non-trivial; the action space comprising valid cut points and available target nodes fluctuates dynamically with each request, rendering standard fixed-output Reinforcement Learning (RL) architectures ineffective. In this paper, we propose a Permutation-Invariant Contextual Attention Bandit (PICAB), a lightweight deep learning framework designed for real-time DNN partitioning. Our architecture employs a Multi-Head Attention mechanism to encode variable-sized sets of candidate execution plans, allowing the agent to generalize across diverse network environment without retraining. By incorporating the target node’s battery state into the attention mechanism, the agent learns energy-aware and robust offloading decisions. We evaluate our approach using heterogeneous workloads with periodic IoT inference streams. Experimental results demonstrate that our algorithm achieves a 23% reduction in makespan compared to the metaheuristic baselines while maintaining similar Energy-Delay Product (EDP), effectively balancing the trade-off between inference latency and energy sustainability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-rahman26a, title = {PICAB: A Permutation-Invariant Contextual Attention Bandit for Energy-Constrained Edge AI}, author = {Rahman, Sayed Saminur and Mtawa, Yaser Al and Balogun, Victor}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {236--247}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/rahman26a/rahman26a.pdf}, url = {https://proceedings.mlr.press/v318/rahman26a.html}, abstract = {The deployment of Deep Neural Networks (DNNs) on resource-constrained edge devices presents a fundamental challenge: high-accuracy models often exceed the compute and energy budgets of local hardware, while full cloud offloading incurs unpredictable network latency. Inference splitting has emerged as a promising solution to this trade-off, enabling a DNN to be partitioned layer-wise across the edge-cloud continuum. However, optimizing these split decisions is non-trivial; the action space comprising valid cut points and available target nodes fluctuates dynamically with each request, rendering standard fixed-output Reinforcement Learning (RL) architectures ineffective. In this paper, we propose a Permutation-Invariant Contextual Attention Bandit (PICAB), a lightweight deep learning framework designed for real-time DNN partitioning. Our architecture employs a Multi-Head Attention mechanism to encode variable-sized sets of candidate execution plans, allowing the agent to generalize across diverse network environment without retraining. By incorporating the target node’s battery state into the attention mechanism, the agent learns energy-aware and robust offloading decisions. We evaluate our approach using heterogeneous workloads with periodic IoT inference streams. Experimental results demonstrate that our algorithm achieves a 23% reduction in makespan compared to the metaheuristic baselines while maintaining similar Energy-Delay Product (EDP), effectively balancing the trade-off between inference latency and energy sustainability.} }
Endnote
%0 Conference Paper %T PICAB: A Permutation-Invariant Contextual Attention Bandit for Energy-Constrained Edge AI %A Sayed Saminur Rahman %A Yaser Al Mtawa %A Victor Balogun %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-rahman26a %I PMLR %P 236--247 %U https://proceedings.mlr.press/v318/rahman26a.html %V 318 %X The deployment of Deep Neural Networks (DNNs) on resource-constrained edge devices presents a fundamental challenge: high-accuracy models often exceed the compute and energy budgets of local hardware, while full cloud offloading incurs unpredictable network latency. Inference splitting has emerged as a promising solution to this trade-off, enabling a DNN to be partitioned layer-wise across the edge-cloud continuum. However, optimizing these split decisions is non-trivial; the action space comprising valid cut points and available target nodes fluctuates dynamically with each request, rendering standard fixed-output Reinforcement Learning (RL) architectures ineffective. In this paper, we propose a Permutation-Invariant Contextual Attention Bandit (PICAB), a lightweight deep learning framework designed for real-time DNN partitioning. Our architecture employs a Multi-Head Attention mechanism to encode variable-sized sets of candidate execution plans, allowing the agent to generalize across diverse network environment without retraining. By incorporating the target node’s battery state into the attention mechanism, the agent learns energy-aware and robust offloading decisions. We evaluate our approach using heterogeneous workloads with periodic IoT inference streams. Experimental results demonstrate that our algorithm achieves a 23% reduction in makespan compared to the metaheuristic baselines while maintaining similar Energy-Delay Product (EDP), effectively balancing the trade-off between inference latency and energy sustainability.
APA
Rahman, S.S., Mtawa, Y.A. & Balogun, V.. (2026). PICAB: A Permutation-Invariant Contextual Attention Bandit for Energy-Constrained Edge AI. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:236-247 Available from https://proceedings.mlr.press/v318/rahman26a.html.

Related Material