A Geometric Explanation of the Likelihood OOD Detection Paradox

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul Krishnan, Gabriel Loaiza-Ganem
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22908-22935, 2024.

Abstract

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at our GitHub repository.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kamkari24a, title = {A Geometric Explanation of the Likelihood {OOD} Detection Paradox}, author = {Kamkari, Hamidreza and Ross, Brendan Leigh and Cresswell, Jesse C. and Caterini, Anthony L. and Krishnan, Rahul and Loaiza-Ganem, Gabriel}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {22908--22935}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kamkari24a/kamkari24a.pdf}, url = {https://proceedings.mlr.press/v235/kamkari24a.html}, abstract = {Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at our GitHub repository.} }
Endnote
%0 Conference Paper %T A Geometric Explanation of the Likelihood OOD Detection Paradox %A Hamidreza Kamkari %A Brendan Leigh Ross %A Jesse C. Cresswell %A Anthony L. Caterini %A Rahul Krishnan %A Gabriel Loaiza-Ganem %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kamkari24a %I PMLR %P 22908--22935 %U https://proceedings.mlr.press/v235/kamkari24a.html %V 235 %X Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at our GitHub repository.
APA
Kamkari, H., Ross, B.L., Cresswell, J.C., Caterini, A.L., Krishnan, R. & Loaiza-Ganem, G.. (2024). A Geometric Explanation of the Likelihood OOD Detection Paradox. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:22908-22935 Available from https://proceedings.mlr.press/v235/kamkari24a.html.

Related Material