Video Action Recognition with Neural Architecture Search

Yuanding Zhou, Baopu Li, Zhihui Wang, Haojie Li
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1675-1690, 2021.

Abstract

Recently, deep convolutional neural networks have been widely used in the field of videoaction recognition. Current approaches tend to concentrate on the structure design fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress. With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers. We relax the entire search space into a continuoussearch space, and search for a set of network architectures that balance accuracy andcomputational efficiency by considering accuracy as the primary optimization goal andcomputational complexity as the secondary optimization goal. We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-zhou21a, title = {Video Action Recognition with Neural Architecture Search}, author = {Zhou, Yuanding and Li, Baopu and Wang, Zhihui and Li, Haojie}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {1675--1690}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/zhou21a/zhou21a.pdf}, url = {https://proceedings.mlr.press/v157/zhou21a.html}, abstract = {Recently, deep convolutional neural networks have been widely used in the field of videoaction recognition. Current approaches tend to concentrate on the structure design fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress. With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers. We relax the entire search space into a continuoussearch space, and search for a set of network architectures that balance accuracy andcomputational efficiency by considering accuracy as the primary optimization goal andcomputational complexity as the secondary optimization goal. We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.} }
Endnote
%0 Conference Paper %T Video Action Recognition with Neural Architecture Search %A Yuanding Zhou %A Baopu Li %A Zhihui Wang %A Haojie Li %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-zhou21a %I PMLR %P 1675--1690 %U https://proceedings.mlr.press/v157/zhou21a.html %V 157 %X Recently, deep convolutional neural networks have been widely used in the field of videoaction recognition. Current approaches tend to concentrate on the structure design fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress. With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers. We relax the entire search space into a continuoussearch space, and search for a set of network architectures that balance accuracy andcomputational efficiency by considering accuracy as the primary optimization goal andcomputational complexity as the secondary optimization goal. We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.
APA
Zhou, Y., Li, B., Wang, Z. & Li, H.. (2021). Video Action Recognition with Neural Architecture Search. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1675-1690 Available from https://proceedings.mlr.press/v157/zhou21a.html.

Related Material