[edit]
PICAB: A Permutation-Invariant Contextual Attention Bandit for Energy-Constrained Edge AI
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:236-247, 2026.
Abstract
The deployment of Deep Neural Networks (DNNs) on resource-constrained edge devices presents a fundamental challenge: high-accuracy models often exceed the compute and energy budgets of local hardware, while full cloud offloading incurs unpredictable network latency. Inference splitting has emerged as a promising solution to this trade-off, enabling a DNN to be partitioned layer-wise across the edge-cloud continuum. However, optimizing these split decisions is non-trivial; the action space comprising valid cut points and available target nodes fluctuates dynamically with each request, rendering standard fixed-output Reinforcement Learning (RL) architectures ineffective. In this paper, we propose a Permutation-Invariant Contextual Attention Bandit (PICAB), a lightweight deep learning framework designed for real-time DNN partitioning. Our architecture employs a Multi-Head Attention mechanism to encode variable-sized sets of candidate execution plans, allowing the agent to generalize across diverse network environment without retraining. By incorporating the target node’s battery state into the attention mechanism, the agent learns energy-aware and robust offloading decisions. We evaluate our approach using heterogeneous workloads with periodic IoT inference streams. Experimental results demonstrate that our algorithm achieves a 23% reduction in makespan compared to the metaheuristic baselines while maintaining similar Energy-Delay Product (EDP), effectively balancing the trade-off between inference latency and energy sustainability.