Embedding Adaptation Network with Transformer for Few-Shot Action Recognition

Rongrong Jin, Xiao Wang, Guangge Wang, Yang Lu, Hai-Miao Hu, Hanzi Wang
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:515-530, 2023.

Abstract

Few-shot action recognition aims to classify novel action categories using a few training samples. Most current few-shot action recognition methods via episodic training strategy mainly use the same normalization method to normalize feature embeddings, leading to limited performance when the batch size is small. And some methods learn feature embeddings individually without considering the whole task, neglecting important interactive information between videos in the current episode. To address these problems, we propose a novel embedding adaptation network with Transformer (EANT) for few-shot action recognition. Specifically, we first propose an improved self-guided instance normalization (SGIN) module to adaptively learn class-specific feature embeddings in an input-dependent manner. Built upon the learned feature embeddings, we design a Transformer-based embedding learning (TEL) module to learn task-specific feature embeddings by fully capturing rich information cross videos in each episodic task. Furthermore, we utilize semantic knowledge among all sampled training classes as additional supervisory information to improve the generalization ability of the network. By this means, the proposed EANT can be highly effective and informative for few-shot action recognition. Extensive experiments conducted on several challenging few-shot action recognition benchmarks show that the proposed EANT outperforms several state-of-the-art methods by a large margin.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-jin23a, title = {Embedding Adaptation Network with Transformer for Few-Shot Action Recognition}, author = {Jin, Rongrong and Wang, Xiao and Wang, Guangge and Lu, Yang and Hu, Hai-Miao and Wang, Hanzi}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {515--530}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/jin23a/jin23a.pdf}, url = {https://proceedings.mlr.press/v189/jin23a.html}, abstract = {Few-shot action recognition aims to classify novel action categories using a few training samples. Most current few-shot action recognition methods via episodic training strategy mainly use the same normalization method to normalize feature embeddings, leading to limited performance when the batch size is small. And some methods learn feature embeddings individually without considering the whole task, neglecting important interactive information between videos in the current episode. To address these problems, we propose a novel embedding adaptation network with Transformer (EANT) for few-shot action recognition. Specifically, we first propose an improved self-guided instance normalization (SGIN) module to adaptively learn class-specific feature embeddings in an input-dependent manner. Built upon the learned feature embeddings, we design a Transformer-based embedding learning (TEL) module to learn task-specific feature embeddings by fully capturing rich information cross videos in each episodic task. Furthermore, we utilize semantic knowledge among all sampled training classes as additional supervisory information to improve the generalization ability of the network. By this means, the proposed EANT can be highly effective and informative for few-shot action recognition. Extensive experiments conducted on several challenging few-shot action recognition benchmarks show that the proposed EANT outperforms several state-of-the-art methods by a large margin.} }
Endnote
%0 Conference Paper %T Embedding Adaptation Network with Transformer for Few-Shot Action Recognition %A Rongrong Jin %A Xiao Wang %A Guangge Wang %A Yang Lu %A Hai-Miao Hu %A Hanzi Wang %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-jin23a %I PMLR %P 515--530 %U https://proceedings.mlr.press/v189/jin23a.html %V 189 %X Few-shot action recognition aims to classify novel action categories using a few training samples. Most current few-shot action recognition methods via episodic training strategy mainly use the same normalization method to normalize feature embeddings, leading to limited performance when the batch size is small. And some methods learn feature embeddings individually without considering the whole task, neglecting important interactive information between videos in the current episode. To address these problems, we propose a novel embedding adaptation network with Transformer (EANT) for few-shot action recognition. Specifically, we first propose an improved self-guided instance normalization (SGIN) module to adaptively learn class-specific feature embeddings in an input-dependent manner. Built upon the learned feature embeddings, we design a Transformer-based embedding learning (TEL) module to learn task-specific feature embeddings by fully capturing rich information cross videos in each episodic task. Furthermore, we utilize semantic knowledge among all sampled training classes as additional supervisory information to improve the generalization ability of the network. By this means, the proposed EANT can be highly effective and informative for few-shot action recognition. Extensive experiments conducted on several challenging few-shot action recognition benchmarks show that the proposed EANT outperforms several state-of-the-art methods by a large margin.
APA
Jin, R., Wang, X., Wang, G., Lu, Y., Hu, H. & Wang, H.. (2023). Embedding Adaptation Network with Transformer for Few-Shot Action Recognition. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:515-530 Available from https://proceedings.mlr.press/v189/jin23a.html.

Related Material