Static Automatic Batching In TensorFlow

Ashish Agarwal
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:92-101, 2019.

Abstract

Dynamic neural networks are becoming increasingly common, and yet it is hard to implement them efficiently. On-the-fly operation batching for such models is sub-optimal and suffers from run time overheads, while writing manually batched versions can be hard and error-prone. To address this we extend TensorFlow with pfor, a parallel-for loop optimized using static loop vectorization. With pfor, users can express computation using nested loops and conditional constructs, but get performance resembling that of a manually batched version. Benchmarks demonstrate speedups of one to two orders of magnitude on range of tasks, from jacobian computation, to Graph Neural Networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-agarwal19a, title = {Static Automatic Batching In {T}ensor{F}low}, author = {Agarwal, Ashish}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {92--101}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/agarwal19a/agarwal19a.pdf}, url = {https://proceedings.mlr.press/v97/agarwal19a.html}, abstract = {Dynamic neural networks are becoming increasingly common, and yet it is hard to implement them efficiently. On-the-fly operation batching for such models is sub-optimal and suffers from run time overheads, while writing manually batched versions can be hard and error-prone. To address this we extend TensorFlow with pfor, a parallel-for loop optimized using static loop vectorization. With pfor, users can express computation using nested loops and conditional constructs, but get performance resembling that of a manually batched version. Benchmarks demonstrate speedups of one to two orders of magnitude on range of tasks, from jacobian computation, to Graph Neural Networks.} }
Endnote
%0 Conference Paper %T Static Automatic Batching In TensorFlow %A Ashish Agarwal %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-agarwal19a %I PMLR %P 92--101 %U https://proceedings.mlr.press/v97/agarwal19a.html %V 97 %X Dynamic neural networks are becoming increasingly common, and yet it is hard to implement them efficiently. On-the-fly operation batching for such models is sub-optimal and suffers from run time overheads, while writing manually batched versions can be hard and error-prone. To address this we extend TensorFlow with pfor, a parallel-for loop optimized using static loop vectorization. With pfor, users can express computation using nested loops and conditional constructs, but get performance resembling that of a manually batched version. Benchmarks demonstrate speedups of one to two orders of magnitude on range of tasks, from jacobian computation, to Graph Neural Networks.
APA
Agarwal, A.. (2019). Static Automatic Batching In TensorFlow. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:92-101 Available from https://proceedings.mlr.press/v97/agarwal19a.html.

Related Material