SoftSort: A Continuous Relaxation for the argsort Operator

Sebastian Prillo, Julian Eisenschlos
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:7793-7802, 2020.

Abstract

While sorting is an important procedure in computer science, the argsort operator - which takes as input a vector and returns its sorting permutation - has a discrete image and thus zero gradients almost everywhere. This prohibits end-to-end, gradient-based learning of models that rely on the argsort operator. A natural way to overcome this problem is to replace the argsort operator with a continuous relaxation. Recent work has shown a number of ways to do this, but the relaxations proposed so far are computationally complex. In this work we propose a simple continuous relaxation for the argsort operator which has the following qualities: it can be implemented in three lines of code, achieves state-of-the-art performance, is easy to reason about mathematically - substantially simplifying proofs - and is faster than competing approaches. We open source the code to reproduce all of the experiments and results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-prillo20a, title = {{S}oft{S}ort: A Continuous Relaxation for the argsort Operator}, author = {Prillo, Sebastian and Eisenschlos, Julian}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {7793--7802}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/prillo20a/prillo20a.pdf}, url = {https://proceedings.mlr.press/v119/prillo20a.html}, abstract = {While sorting is an important procedure in computer science, the argsort operator - which takes as input a vector and returns its sorting permutation - has a discrete image and thus zero gradients almost everywhere. This prohibits end-to-end, gradient-based learning of models that rely on the argsort operator. A natural way to overcome this problem is to replace the argsort operator with a continuous relaxation. Recent work has shown a number of ways to do this, but the relaxations proposed so far are computationally complex. In this work we propose a simple continuous relaxation for the argsort operator which has the following qualities: it can be implemented in three lines of code, achieves state-of-the-art performance, is easy to reason about mathematically - substantially simplifying proofs - and is faster than competing approaches. We open source the code to reproduce all of the experiments and results.} }
Endnote
%0 Conference Paper %T SoftSort: A Continuous Relaxation for the argsort Operator %A Sebastian Prillo %A Julian Eisenschlos %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-prillo20a %I PMLR %P 7793--7802 %U https://proceedings.mlr.press/v119/prillo20a.html %V 119 %X While sorting is an important procedure in computer science, the argsort operator - which takes as input a vector and returns its sorting permutation - has a discrete image and thus zero gradients almost everywhere. This prohibits end-to-end, gradient-based learning of models that rely on the argsort operator. A natural way to overcome this problem is to replace the argsort operator with a continuous relaxation. Recent work has shown a number of ways to do this, but the relaxations proposed so far are computationally complex. In this work we propose a simple continuous relaxation for the argsort operator which has the following qualities: it can be implemented in three lines of code, achieves state-of-the-art performance, is easy to reason about mathematically - substantially simplifying proofs - and is faster than competing approaches. We open source the code to reproduce all of the experiments and results.
APA
Prillo, S. & Eisenschlos, J.. (2020). SoftSort: A Continuous Relaxation for the argsort Operator. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:7793-7802 Available from https://proceedings.mlr.press/v119/prillo20a.html.

Related Material