The Catalog Problem: Clustering and Ordering Variable-Sized Sets

Mateusz Maria Jurewicz, Graham W. Taylor, Leon Derczynski
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:15528-15545, 2023.

Abstract

Prediction of a $\textbf{varying number}$ of $\textbf{ordered clusters}$ from sets of $\textbf{any cardinality}$ is a challenging task for neural networks, combining elements of set representation, clustering and learning to order. This task arises in many diverse areas, ranging from medical triage and early discharge, through machine part management and multi-channel signal analysis for petroleum exploration to product catalog structure prediction. This paper focuses on that last area, which exemplifies a number of challenges inherent to adaptive ordered clustering, referred to further as the eponymous $\textit{Catalog Problem}$. These include learning variable cluster constraints, exhibiting relational reasoning and managing combinatorial complexity. Despite progress in both neural clustering and set-to-sequence methods, no joint, fully differentiable model exists to-date. We develop such a modular architecture, referred to further as Neural Ordered Clusters (NOC), enhance it with a specific mechanism for learning cluster-level cardinality constraints, and provide a robust comparison of its performance in relation to alternative models. We test our method on three datasets, including synthetic catalog structures and PROCAT, a dataset of real-world catalogs consisting of over 1.5M products, achieving state-of-the-art results on a new, more challenging formulation of the underlying problem, which has not been addressed before. Additionally, we examine the network’s ability to learn higher-order interactions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-jurewicz23a, title = {The Catalog Problem: Clustering and Ordering Variable-Sized Sets}, author = {Jurewicz, Mateusz Maria and Taylor, Graham W. and Derczynski, Leon}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {15528--15545}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/jurewicz23a/jurewicz23a.pdf}, url = {https://proceedings.mlr.press/v202/jurewicz23a.html}, abstract = {Prediction of a $\textbf{varying number}$ of $\textbf{ordered clusters}$ from sets of $\textbf{any cardinality}$ is a challenging task for neural networks, combining elements of set representation, clustering and learning to order. This task arises in many diverse areas, ranging from medical triage and early discharge, through machine part management and multi-channel signal analysis for petroleum exploration to product catalog structure prediction. This paper focuses on that last area, which exemplifies a number of challenges inherent to adaptive ordered clustering, referred to further as the eponymous $\textit{Catalog Problem}$. These include learning variable cluster constraints, exhibiting relational reasoning and managing combinatorial complexity. Despite progress in both neural clustering and set-to-sequence methods, no joint, fully differentiable model exists to-date. We develop such a modular architecture, referred to further as Neural Ordered Clusters (NOC), enhance it with a specific mechanism for learning cluster-level cardinality constraints, and provide a robust comparison of its performance in relation to alternative models. We test our method on three datasets, including synthetic catalog structures and PROCAT, a dataset of real-world catalogs consisting of over 1.5M products, achieving state-of-the-art results on a new, more challenging formulation of the underlying problem, which has not been addressed before. Additionally, we examine the network’s ability to learn higher-order interactions.} }
Endnote
%0 Conference Paper %T The Catalog Problem: Clustering and Ordering Variable-Sized Sets %A Mateusz Maria Jurewicz %A Graham W. Taylor %A Leon Derczynski %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-jurewicz23a %I PMLR %P 15528--15545 %U https://proceedings.mlr.press/v202/jurewicz23a.html %V 202 %X Prediction of a $\textbf{varying number}$ of $\textbf{ordered clusters}$ from sets of $\textbf{any cardinality}$ is a challenging task for neural networks, combining elements of set representation, clustering and learning to order. This task arises in many diverse areas, ranging from medical triage and early discharge, through machine part management and multi-channel signal analysis for petroleum exploration to product catalog structure prediction. This paper focuses on that last area, which exemplifies a number of challenges inherent to adaptive ordered clustering, referred to further as the eponymous $\textit{Catalog Problem}$. These include learning variable cluster constraints, exhibiting relational reasoning and managing combinatorial complexity. Despite progress in both neural clustering and set-to-sequence methods, no joint, fully differentiable model exists to-date. We develop such a modular architecture, referred to further as Neural Ordered Clusters (NOC), enhance it with a specific mechanism for learning cluster-level cardinality constraints, and provide a robust comparison of its performance in relation to alternative models. We test our method on three datasets, including synthetic catalog structures and PROCAT, a dataset of real-world catalogs consisting of over 1.5M products, achieving state-of-the-art results on a new, more challenging formulation of the underlying problem, which has not been addressed before. Additionally, we examine the network’s ability to learn higher-order interactions.
APA
Jurewicz, M.M., Taylor, G.W. & Derczynski, L.. (2023). The Catalog Problem: Clustering and Ordering Variable-Sized Sets. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:15528-15545 Available from https://proceedings.mlr.press/v202/jurewicz23a.html.

Related Material