Temporal RPN Learning for Weakly-Supervised Temporal Action Localization

Jing Huang; Ming Kong; Luyuan Chen; Tian Liang; Qiang Zhu

Temporal RPN Learning for Weakly-Supervised Temporal Action Localization

Jing Huang, Ming Kong, Luyuan Chen, Tian Liang, Qiang Zhu

Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:470-485, 2024.

Abstract

Weakly-Supervised Temporal Action Localization (WSTAL) aims to train an action instance localization model from untrimmed videos with only video-level labels, similar to the Object Detection (OD) task. Existing Top-k MIL-based WSTAL methods cannot flexibly define the learning space, which limits the model’s learning efficiency and performance. Faster R-CNN is a classic two-stage object detection architecture with an efficient Region Proposal Network. This paper successfully migrates the Faster R-CNN liked two-stage architecture to the WSTAL task: first to build a T-RPN and integrate it with the traditional WSTAL framework; and then to propose a pseudo label generation mechanism to enable the T-RPN learning without temporal annotations. Our new framework has achieved breakthrough performances on THUMOS-14 and ActivityNet-v1.2 datasets, and comprehensive ablation experiments have verified the effectiveness of the innovations. Code will be available at: \href{https://github.com/ZJUHJ/TRPN}{https://github.com/ZJUHJ/TRPN}.

Cite this Paper

BibTeX


@InProceedings{pmlr-v222-huang24a,
  title = 	 {Temporal RPN Learning for Weakly-Supervised Temporal Action Localization},
  author =       {Huang, Jing and Kong, Ming and Chen, Luyuan and Liang, Tian and Zhu, Qiang},
  booktitle = 	 {Proceedings of the 15th Asian Conference on Machine Learning},
  pages = 	 {470--485},
  year = 	 {2024},
  editor = 	 {Yanıkoğlu, Berrin and Buntine, Wray},
  volume = 	 {222},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v222/huang24a/huang24a.pdf},
  url = 	 {https://proceedings.mlr.press/v222/huang24a.html},
  abstract = 	 {Weakly-Supervised Temporal Action Localization (WSTAL) aims to train an action instance localization model from untrimmed videos with only video-level labels, similar to the Object Detection (OD) task. Existing Top-k MIL-based WSTAL methods cannot flexibly define the learning space, which limits the model’s learning efficiency and performance. Faster R-CNN is a classic two-stage object detection architecture with an efficient Region Proposal Network. This paper successfully migrates the Faster R-CNN liked two-stage architecture to the WSTAL task: first to build a T-RPN and integrate it with the traditional WSTAL framework; and then to propose a pseudo label generation mechanism to enable the T-RPN learning without temporal annotations. Our new framework has achieved breakthrough performances on THUMOS-14 and ActivityNet-v1.2 datasets, and comprehensive ablation experiments have verified the effectiveness of the innovations. Code will be available at: \href{https://github.com/ZJUHJ/TRPN}{https://github.com/ZJUHJ/TRPN}.}
}

Endnote

%0 Conference Paper
%T Temporal RPN Learning for Weakly-Supervised Temporal Action Localization
%A Jing Huang
%A Ming Kong
%A Luyuan Chen
%A Tian Liang
%A Qiang Zhu
%B Proceedings of the 15th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Berrin Yanıkoğlu
%E Wray Buntine	
%F pmlr-v222-huang24a
%I PMLR
%P 470--485
%U https://proceedings.mlr.press/v222/huang24a.html
%V 222
%X Weakly-Supervised Temporal Action Localization (WSTAL) aims to train an action instance localization model from untrimmed videos with only video-level labels, similar to the Object Detection (OD) task. Existing Top-k MIL-based WSTAL methods cannot flexibly define the learning space, which limits the model’s learning efficiency and performance. Faster R-CNN is a classic two-stage object detection architecture with an efficient Region Proposal Network. This paper successfully migrates the Faster R-CNN liked two-stage architecture to the WSTAL task: first to build a T-RPN and integrate it with the traditional WSTAL framework; and then to propose a pseudo label generation mechanism to enable the T-RPN learning without temporal annotations. Our new framework has achieved breakthrough performances on THUMOS-14 and ActivityNet-v1.2 datasets, and comprehensive ablation experiments have verified the effectiveness of the innovations. Code will be available at: \href{https://github.com/ZJUHJ/TRPN}{https://github.com/ZJUHJ/TRPN}.

APA


Huang, J., Kong, M., Chen, L., Liang, T. & Zhu, Q.. (2024). Temporal RPN Learning for Weakly-Supervised Temporal Action Localization. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:470-485 Available from https://proceedings.mlr.press/v222/huang24a.html.

Temporal RPN Learning for Weakly-Supervised Temporal Action Localization

Abstract

Cite this Paper

Related Material