Trainable Decoding of Sets of Sequences for Neural Sequence Models

[edit]

Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra ;
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3211-3221, 2019.

Abstract

Many sequence prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of “capturing the variation in the output space”. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.

Related Material