Towards a Complete Benchmark on Video Moment Localization

Jinyeong Chae, Donghwa Kim, Kwanseok Kim, Doyeon Lee, Sangho Lee, Seongsu Ha, Jonghwan Mun, Wooyoung Kang, Byungseok Roh, Joonseok Lee
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4168-4176, 2024.

Abstract

In this paper, we propose and conduct a comprehensive benchmark on moment localization task, which aims to retrieve a segment that corresponds to a text query from a single untrimmed video. Our study starts from an observation that most moment localization papers report experimental results only on a few datasets in spite of availability of far more benchmarks. Thus, we conduct an extensive benchmark study to measure the performance of representative methods on widely used 7 datasets. Looking further into the details, we pose additional research questions and empirically verify them, including if they rely on unintended biases introduced by specific training data, if advanced visual features trained on classification task transfer well to this task, and if computational cost of each model pays off. With a series of these experiments, we provide multi-faceted evaluation of state-of-the-art moment localization models. Codes are available at \url{https://github.com/snuviplab/MoLEF}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-chae24a, title = {Towards a Complete Benchmark on Video Moment Localization}, author = {Chae, Jinyeong and Kim, Donghwa and Kim, Kwanseok and Lee, Doyeon and Lee, Sangho and Ha, Seongsu and Mun, Jonghwan and Kang, Wooyoung and Roh, Byungseok and Lee, Joonseok}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4168--4176}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/chae24a/chae24a.pdf}, url = {https://proceedings.mlr.press/v238/chae24a.html}, abstract = {In this paper, we propose and conduct a comprehensive benchmark on moment localization task, which aims to retrieve a segment that corresponds to a text query from a single untrimmed video. Our study starts from an observation that most moment localization papers report experimental results only on a few datasets in spite of availability of far more benchmarks. Thus, we conduct an extensive benchmark study to measure the performance of representative methods on widely used 7 datasets. Looking further into the details, we pose additional research questions and empirically verify them, including if they rely on unintended biases introduced by specific training data, if advanced visual features trained on classification task transfer well to this task, and if computational cost of each model pays off. With a series of these experiments, we provide multi-faceted evaluation of state-of-the-art moment localization models. Codes are available at \url{https://github.com/snuviplab/MoLEF}.} }
Endnote
%0 Conference Paper %T Towards a Complete Benchmark on Video Moment Localization %A Jinyeong Chae %A Donghwa Kim %A Kwanseok Kim %A Doyeon Lee %A Sangho Lee %A Seongsu Ha %A Jonghwan Mun %A Wooyoung Kang %A Byungseok Roh %A Joonseok Lee %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-chae24a %I PMLR %P 4168--4176 %U https://proceedings.mlr.press/v238/chae24a.html %V 238 %X In this paper, we propose and conduct a comprehensive benchmark on moment localization task, which aims to retrieve a segment that corresponds to a text query from a single untrimmed video. Our study starts from an observation that most moment localization papers report experimental results only on a few datasets in spite of availability of far more benchmarks. Thus, we conduct an extensive benchmark study to measure the performance of representative methods on widely used 7 datasets. Looking further into the details, we pose additional research questions and empirically verify them, including if they rely on unintended biases introduced by specific training data, if advanced visual features trained on classification task transfer well to this task, and if computational cost of each model pays off. With a series of these experiments, we provide multi-faceted evaluation of state-of-the-art moment localization models. Codes are available at \url{https://github.com/snuviplab/MoLEF}.
APA
Chae, J., Kim, D., Kim, K., Lee, D., Lee, S., Ha, S., Mun, J., Kang, W., Roh, B. & Lee, J.. (2024). Towards a Complete Benchmark on Video Moment Localization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4168-4176 Available from https://proceedings.mlr.press/v238/chae24a.html.

Related Material