Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3211-3221, 2019.

Abstract

Many sequence prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of “capturing the variation in the output space”. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-kalyan19a, title = {Trainable Decoding of Sets of Sequences for Neural Sequence Models}, author = {Kalyan, Ashwin and Anderson, Peter and Lee, Stefan and Batra, Dhruv}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3211--3221}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/kalyan19a/kalyan19a.pdf}, url = {https://proceedings.mlr.press/v97/kalyan19a.html}, abstract = {Many sequence prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of “capturing the variation in the output space”. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.} }
Endnote
%0 Conference Paper %T Trainable Decoding of Sets of Sequences for Neural Sequence Models %A Ashwin Kalyan %A Peter Anderson %A Stefan Lee %A Dhruv Batra %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-kalyan19a %I PMLR %P 3211--3221 %U https://proceedings.mlr.press/v97/kalyan19a.html %V 97 %X Many sequence prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of “capturing the variation in the output space”. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.
APA
Kalyan, A., Anderson, P., Lee, S. & Batra, D.. (2019). Trainable Decoding of Sets of Sequences for Neural Sequence Models. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3211-3221 Available from https://proceedings.mlr.press/v97/kalyan19a.html.

Related Material