Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch; Mathieu Blondel

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch, Mathieu Blondel

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3462-3471, 2018.

Abstract

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-mensch18a,
  title = 	 {Differentiable Dynamic Programming for Structured Prediction and Attention},
  author =       {Mensch, Arthur and Blondel, Mathieu},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {3462--3471},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/mensch18a/mensch18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/mensch18a.html},
  abstract = 	 {Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.}
}

Endnote

%0 Conference Paper
%T Differentiable Dynamic Programming for Structured Prediction and Attention
%A Arthur Mensch
%A Mathieu Blondel
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-mensch18a
%I PMLR
%P 3462--3471
%U https://proceedings.mlr.press/v80/mensch18a.html
%V 80
%X Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

APA


Mensch, A. & Blondel, M.. (2018). Differentiable Dynamic Programming for Structured Prediction and Attention. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3462-3471 Available from https://proceedings.mlr.press/v80/mensch18a.html.

Differentiable Dynamic Programming for Structured Prediction and Attention

Abstract

Cite this Paper

Related Material