Multilevel Position-aware Attention Enhanced Network for Skeleton-Based Action Recognition

Dandan Zhang, Jia Wang, Sicong Zhan
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:921-935, 2025.

Abstract

Effectively capturing the spatiotemporal dependencies between joints is crucial for skeleton-based action recognition. However, existing methods do not consider the sparsity of skeleton data, which hinders the accurate capture of complex posture information and subtle action variations. Moreover, the locality of temporal features requires the model to focus on certain key features. Yet, most methods overlook the impact of temporal redundancy on feature focus, resulting in ineffective capture of significant temporal features. To address the issue of skeleton sparsity, we propose a Multilevel Position-aware Attention module (MPA) that explicitly leverages the relative positional information of the input data to enrich spatial information. To achieve a more effective focus on local temporal features, we develop a Multi-scale Temporal Excitation module (MTE). By scaling temporal features, the MTE module elevates the prominence of salient features and facilitates the capture of multi-scale features. Furthermore, we propose a Part Partition Encoding module (PPE) to aggregate joint data into part data, thereby providing the model with high-level information carried by the interactions between body parts. The MPA, MTE, and PPE are integrated into a unified framework called MPAE-Net. Extensive experimental results demonstrate that the MPAE-Net achieves state-of-the-art performance on two large-scale datasets, NTU RGB+D and NTU RGB+D 120.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-zhang25b, title = {Multilevel Position-aware Attention Enhanced Network for Skeleton-Based Action Recognition}, author = {Zhang, Dandan and Wang, Jia and Zhan, Sicong}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {921--935}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/zhang25b/zhang25b.pdf}, url = {https://proceedings.mlr.press/v260/zhang25b.html}, abstract = {Effectively capturing the spatiotemporal dependencies between joints is crucial for skeleton-based action recognition. However, existing methods do not consider the sparsity of skeleton data, which hinders the accurate capture of complex posture information and subtle action variations. Moreover, the locality of temporal features requires the model to focus on certain key features. Yet, most methods overlook the impact of temporal redundancy on feature focus, resulting in ineffective capture of significant temporal features. To address the issue of skeleton sparsity, we propose a Multilevel Position-aware Attention module (MPA) that explicitly leverages the relative positional information of the input data to enrich spatial information. To achieve a more effective focus on local temporal features, we develop a Multi-scale Temporal Excitation module (MTE). By scaling temporal features, the MTE module elevates the prominence of salient features and facilitates the capture of multi-scale features. Furthermore, we propose a Part Partition Encoding module (PPE) to aggregate joint data into part data, thereby providing the model with high-level information carried by the interactions between body parts. The MPA, MTE, and PPE are integrated into a unified framework called MPAE-Net. Extensive experimental results demonstrate that the MPAE-Net achieves state-of-the-art performance on two large-scale datasets, NTU RGB+D and NTU RGB+D 120.} }
Endnote
%0 Conference Paper %T Multilevel Position-aware Attention Enhanced Network for Skeleton-Based Action Recognition %A Dandan Zhang %A Jia Wang %A Sicong Zhan %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-zhang25b %I PMLR %P 921--935 %U https://proceedings.mlr.press/v260/zhang25b.html %V 260 %X Effectively capturing the spatiotemporal dependencies between joints is crucial for skeleton-based action recognition. However, existing methods do not consider the sparsity of skeleton data, which hinders the accurate capture of complex posture information and subtle action variations. Moreover, the locality of temporal features requires the model to focus on certain key features. Yet, most methods overlook the impact of temporal redundancy on feature focus, resulting in ineffective capture of significant temporal features. To address the issue of skeleton sparsity, we propose a Multilevel Position-aware Attention module (MPA) that explicitly leverages the relative positional information of the input data to enrich spatial information. To achieve a more effective focus on local temporal features, we develop a Multi-scale Temporal Excitation module (MTE). By scaling temporal features, the MTE module elevates the prominence of salient features and facilitates the capture of multi-scale features. Furthermore, we propose a Part Partition Encoding module (PPE) to aggregate joint data into part data, thereby providing the model with high-level information carried by the interactions between body parts. The MPA, MTE, and PPE are integrated into a unified framework called MPAE-Net. Extensive experimental results demonstrate that the MPAE-Net achieves state-of-the-art performance on two large-scale datasets, NTU RGB+D and NTU RGB+D 120.
APA
Zhang, D., Wang, J. & Zhan, S.. (2025). Multilevel Position-aware Attention Enhanced Network for Skeleton-Based Action Recognition. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:921-935 Available from https://proceedings.mlr.press/v260/zhang25b.html.

Related Material