Video Pixel Networks

Nal Kalchbrenner, Aäron Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1771-1779, 2017.

Abstract

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-kalchbrenner17a, title = {Video Pixel Networks}, author = {Nal Kalchbrenner and A{\"a}ron van den Oord and Karen Simonyan and Ivo Danihelka and Oriol Vinyals and Alex Graves and Koray Kavukcuoglu}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {1771--1779}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/kalchbrenner17a/kalchbrenner17a.pdf}, url = {https://proceedings.mlr.press/v70/kalchbrenner17a.html}, abstract = {We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.} }
Endnote
%0 Conference Paper %T Video Pixel Networks %A Nal Kalchbrenner %A Aäron Oord %A Karen Simonyan %A Ivo Danihelka %A Oriol Vinyals %A Alex Graves %A Koray Kavukcuoglu %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-kalchbrenner17a %I PMLR %P 1771--1779 %U https://proceedings.mlr.press/v70/kalchbrenner17a.html %V 70 %X We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.
APA
Kalchbrenner, N., Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A. & Kavukcuoglu, K.. (2017). Video Pixel Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1771-1779 Available from https://proceedings.mlr.press/v70/kalchbrenner17a.html.

Related Material