Video Action Recognition with Neural Architecture Search
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1675-1690, 2021.
Recently, deep convolutional neural networks have been widely used in the field of videoaction recognition. Current approaches tend to concentrate on the structure design fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress. With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers. We relax the entire search space into a continuoussearch space, and search for a set of network architectures that balance accuracy andcomputational efficiency by considering accuracy as the primary optimization goal andcomputational complexity as the secondary optimization goal. We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.