Temporal Relation based Attentive Prototype Network for Few-shot Action Recognition

Guangge Wang, Haihui Ye, Xiao Wang, Weirong Ye, Hanzi Wang
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:406-421, 2021.

Abstract

Few-shot action recognition aims at recognizing novel action classes with only a small number of labeled video samples. We propose a temporal relation based attentive prototype network (TRAPN) for few-shot action recognition. Concretely, we tackle this challenging task from three aspects. Firstly, we propose a spatio-temporal motion enhancement (STME) module to highlight object motions in videos. The STME module utilizes cues from content displacements in videos to enhance the features in the motion-related regions. Secondly, we learn the core common action transformations by our temporal relation (TR) module, which captures the temporal relations at short-term and long-term time scales. The learned temporal relations are encoded into descriptors to constitute sample-level features. The abstract action transformations are described by multiple groups of temporal relation descriptors. Thirdly, a vanilla prototype for the support class (e.g., the mean of the support class) cannot fit well for different query samples. We generate an attentive prototype constructed from temporal relation descriptors of support samples, which gives more weight to discriminative samples. We evaluate our TRAPN on Kinetics, UCF101 and HMDB51 real-world few-shot datasets. Results show that our network achieves the state-of-the-art performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-wang21b, title = {Temporal Relation based Attentive Prototype Network for Few-shot Action Recognition}, author = {Wang, Guangge and Ye, Haihui and Wang, Xiao and Ye, Weirong and Wang, Hanzi}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {406--421}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/wang21b/wang21b.pdf}, url = {https://proceedings.mlr.press/v157/wang21b.html}, abstract = {Few-shot action recognition aims at recognizing novel action classes with only a small number of labeled video samples. We propose a temporal relation based attentive prototype network (TRAPN) for few-shot action recognition. Concretely, we tackle this challenging task from three aspects. Firstly, we propose a spatio-temporal motion enhancement (STME) module to highlight object motions in videos. The STME module utilizes cues from content displacements in videos to enhance the features in the motion-related regions. Secondly, we learn the core common action transformations by our temporal relation (TR) module, which captures the temporal relations at short-term and long-term time scales. The learned temporal relations are encoded into descriptors to constitute sample-level features. The abstract action transformations are described by multiple groups of temporal relation descriptors. Thirdly, a vanilla prototype for the support class (e.g., the mean of the support class) cannot fit well for different query samples. We generate an attentive prototype constructed from temporal relation descriptors of support samples, which gives more weight to discriminative samples. We evaluate our TRAPN on Kinetics, UCF101 and HMDB51 real-world few-shot datasets. Results show that our network achieves the state-of-the-art performance.} }
Endnote
%0 Conference Paper %T Temporal Relation based Attentive Prototype Network for Few-shot Action Recognition %A Guangge Wang %A Haihui Ye %A Xiao Wang %A Weirong Ye %A Hanzi Wang %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-wang21b %I PMLR %P 406--421 %U https://proceedings.mlr.press/v157/wang21b.html %V 157 %X Few-shot action recognition aims at recognizing novel action classes with only a small number of labeled video samples. We propose a temporal relation based attentive prototype network (TRAPN) for few-shot action recognition. Concretely, we tackle this challenging task from three aspects. Firstly, we propose a spatio-temporal motion enhancement (STME) module to highlight object motions in videos. The STME module utilizes cues from content displacements in videos to enhance the features in the motion-related regions. Secondly, we learn the core common action transformations by our temporal relation (TR) module, which captures the temporal relations at short-term and long-term time scales. The learned temporal relations are encoded into descriptors to constitute sample-level features. The abstract action transformations are described by multiple groups of temporal relation descriptors. Thirdly, a vanilla prototype for the support class (e.g., the mean of the support class) cannot fit well for different query samples. We generate an attentive prototype constructed from temporal relation descriptors of support samples, which gives more weight to discriminative samples. We evaluate our TRAPN on Kinetics, UCF101 and HMDB51 real-world few-shot datasets. Results show that our network achieves the state-of-the-art performance.
APA
Wang, G., Ye, H., Wang, X., Ye, W. & Wang, H.. (2021). Temporal Relation based Attentive Prototype Network for Few-shot Action Recognition. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:406-421 Available from https://proceedings.mlr.press/v157/wang21b.html.

Related Material