Video Pixel Networks

Nal Kalchbrenner; Aäron Oord; Karen Simonyan; Ivo Danihelka; Oriol Vinyals; Alex Graves; Koray Kavukcuoglu

Video Pixel Networks

Nal Kalchbrenner, Aäron Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1771-1779, 2017.

Abstract

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-kalchbrenner17a,
  title = 	 {Video Pixel Networks},
  author =       {Nal Kalchbrenner and A{\"a}ron van den Oord and Karen Simonyan and Ivo Danihelka and Oriol Vinyals and Alex Graves and Koray Kavukcuoglu},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {1771--1779},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/kalchbrenner17a/kalchbrenner17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/kalchbrenner17a.html},
  abstract = 	 {We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.}
}

Endnote

%0 Conference Paper
%T Video Pixel Networks
%A Nal Kalchbrenner
%A Aäron Oord
%A Karen Simonyan
%A Ivo Danihelka
%A Oriol Vinyals
%A Alex Graves
%A Koray Kavukcuoglu
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-kalchbrenner17a
%I PMLR
%P 1771--1779
%U https://proceedings.mlr.press/v70/kalchbrenner17a.html
%V 70
%X We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

APA


Kalchbrenner, N., Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A. & Kavukcuoglu, K.. (2017). Video Pixel Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1771-1779 Available from https://proceedings.mlr.press/v70/kalchbrenner17a.html.

Video Pixel Networks

Abstract

Cite this Paper

Related Material