Video Action Recognition with Neural Architecture Search

Yuanding Zhou; Baopu Li; Zhihui Wang; Haojie Li

Video Action Recognition with Neural Architecture Search

Yuanding Zhou, Baopu Li, Zhihui Wang, Haojie Li

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1675-1690, 2021.

Abstract

Recently, deep convolutional neural networks have been widely used in the field of videoaction recognition. Current approaches tend to concentrate on the structure design fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress. With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers. We relax the entire search space into a continuoussearch space, and search for a set of network architectures that balance accuracy andcomputational efficiency by considering accuracy as the primary optimization goal andcomputational complexity as the secondary optimization goal. We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.

Cite this Paper

BibTeX


@InProceedings{pmlr-v157-zhou21a,
  title = 	 {Video Action Recognition with Neural Architecture Search},
  author =       {Zhou, Yuanding and Li, Baopu and Wang, Zhihui and Li, Haojie},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {1675--1690},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/zhou21a/zhou21a.pdf},
  url = 	 {https://proceedings.mlr.press/v157/zhou21a.html},
  abstract = 	 {Recently, deep convolutional neural networks have been widely used in the field of videoaction  recognition.   Current  approaches  tend  to  concentrate  on  the  structure  design  fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress.  With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers.  We relax the entire search space into a continuoussearch  space,  and  search  for  a  set  of  network  architectures  that  balance  accuracy  andcomputational  efficiency  by  considering  accuracy  as  the  primary  optimization  goal  andcomputational complexity as the secondary optimization goal.  We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.}
}

Endnote

%0 Conference Paper
%T Video Action Recognition with Neural Architecture Search
%A Yuanding Zhou
%A Baopu Li
%A Zhihui Wang
%A Haojie Li
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-zhou21a
%I PMLR
%P 1675--1690
%U https://proceedings.mlr.press/v157/zhou21a.html
%V 157
%X Recently, deep convolutional neural networks have been widely used in the field of videoaction  recognition.   Current  approaches  tend  to  concentrate  on  the  structure  design  fordifferent backbone networks, but what kind of network structures can process video botheffectively and quickly still remains to be solved despite the encouraging progress.  With thehelp of neural architecture search (NAS), we search for three hyperparameters in the videoprocessing network, which are the number of frames, the number of layers per residual stageand the channel number for all layers.  We relax the entire search space into a continuoussearch  space,  and  search  for  a  set  of  network  architectures  that  balance  accuracy  andcomputational  efficiency  by  considering  accuracy  as  the  primary  optimization  goal  andcomputational complexity as the secondary optimization goal.  We conduct experiments onUCF101 and Kinetics400 datasets, validating new state-of-the-art results of the proposedNAS based scheme for video action recognition.

APA


Zhou, Y., Li, B., Wang, Z. & Li, H.. (2021). Video Action Recognition with Neural Architecture Search. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1675-1690 Available from https://proceedings.mlr.press/v157/zhou21a.html.

Related Material

Download PDF