[edit]
Embedding Adaptation Network with Transformer for Few-Shot Action Recognition
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:515-530, 2023.
Abstract
Few-shot action recognition aims to classify novel
action categories using a few training samples. Most
current few-shot action recognition methods via
episodic training strategy mainly use the same
normalization method to normalize feature
embeddings, leading to limited performance when the
batch size is small. And some methods learn feature
embeddings individually without considering the
whole task, neglecting important interactive
information between videos in the current
episode. To address these problems, we propose a
novel embedding adaptation network with Transformer
(EANT) for few-shot action
recognition. Specifically, we first propose an
improved self-guided instance normalization (SGIN)
module to adaptively learn class-specific feature
embeddings in an input-dependent manner. Built upon
the learned feature embeddings, we design a
Transformer-based embedding learning (TEL) module to
learn task-specific feature embeddings by fully
capturing rich information cross videos in each
episodic task. Furthermore, we utilize semantic
knowledge among all sampled training classes as
additional supervisory information to improve the
generalization ability of the network. By this
means, the proposed EANT can be highly effective and
informative for few-shot action
recognition. Extensive experiments conducted on
several challenging few-shot action recognition
benchmarks show that the proposed EANT outperforms
several state-of-the-art methods by a large margin.