Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence

Sascha Xu, Nils Philipp Walter, Janis Kalofolias, Jilles Vreeken
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:55267-55285, 2024.

Abstract

Finding and describing sub-populations that are exceptional in terms of a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose SYFLOW, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic data, real-world data, and via a case study, that SYFLOW reliably finds highly exceptional subgroups accompanied by insightful descriptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xu24w, title = {Learning Exceptional Subgroups by End-to-End Maximizing {KL}-Divergence}, author = {Xu, Sascha and Walter, Nils Philipp and Kalofolias, Janis and Vreeken, Jilles}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {55267--55285}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24w/xu24w.pdf}, url = {https://proceedings.mlr.press/v235/xu24w.html}, abstract = {Finding and describing sub-populations that are exceptional in terms of a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose SYFLOW, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic data, real-world data, and via a case study, that SYFLOW reliably finds highly exceptional subgroups accompanied by insightful descriptions.} }
Endnote
%0 Conference Paper %T Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence %A Sascha Xu %A Nils Philipp Walter %A Janis Kalofolias %A Jilles Vreeken %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xu24w %I PMLR %P 55267--55285 %U https://proceedings.mlr.press/v235/xu24w.html %V 235 %X Finding and describing sub-populations that are exceptional in terms of a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose SYFLOW, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic data, real-world data, and via a case study, that SYFLOW reliably finds highly exceptional subgroups accompanied by insightful descriptions.
APA
Xu, S., Walter, N.P., Kalofolias, J. & Vreeken, J.. (2024). Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:55267-55285 Available from https://proceedings.mlr.press/v235/xu24w.html.

Related Material