Direct Motion Models for Assessing Generated Videos

Kelsey R Allen, Carl Doersch, Guangyao Zhou, Mohammed Suhail, Danny Driess, Ignacio Rocco, Yulia Rubanova, Thomas Kipf, Mehdi S. M. Sajjadi, Kevin Patrick Murphy, Joao Carreira, Sjoerd Van Steenkiste
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:1159-1183, 2025.

Abstract

A current limitation of video generative video models is that they generate plausible looking frames, but poor motion — an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: trajan-paper.github.io.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-allen25a, title = {Direct Motion Models for Assessing Generated Videos}, author = {Allen, Kelsey R and Doersch, Carl and Zhou, Guangyao and Suhail, Mohammed and Driess, Danny and Rocco, Ignacio and Rubanova, Yulia and Kipf, Thomas and Sajjadi, Mehdi S. M. and Murphy, Kevin Patrick and Carreira, Joao and Steenkiste, Sjoerd Van}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {1159--1183}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/allen25a/allen25a.pdf}, url = {https://proceedings.mlr.press/v267/allen25a.html}, abstract = {A current limitation of video generative video models is that they generate plausible looking frames, but poor motion — an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: trajan-paper.github.io.} }
Endnote
%0 Conference Paper %T Direct Motion Models for Assessing Generated Videos %A Kelsey R Allen %A Carl Doersch %A Guangyao Zhou %A Mohammed Suhail %A Danny Driess %A Ignacio Rocco %A Yulia Rubanova %A Thomas Kipf %A Mehdi S. M. Sajjadi %A Kevin Patrick Murphy %A Joao Carreira %A Sjoerd Van Steenkiste %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-allen25a %I PMLR %P 1159--1183 %U https://proceedings.mlr.press/v267/allen25a.html %V 267 %X A current limitation of video generative video models is that they generate plausible looking frames, but poor motion — an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: trajan-paper.github.io.
APA
Allen, K.R., Doersch, C., Zhou, G., Suhail, M., Driess, D., Rocco, I., Rubanova, Y., Kipf, T., Sajjadi, M.S.M., Murphy, K.P., Carreira, J. & Steenkiste, S.V.. (2025). Direct Motion Models for Assessing Generated Videos. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:1159-1183 Available from https://proceedings.mlr.press/v267/allen25a.html.

Related Material