Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

Luca Masserano, Alexander Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:34987-35012, 2024.

Abstract

An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier’s receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-masserano24a, title = {Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference}, author = {Masserano, Luca and Shen, Alexander and Doro, Michele and Dorigo, Tommaso and Izbicki, Rafael and Lee, Ann B.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {34987--35012}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/masserano24a/masserano24a.pdf}, url = {https://proceedings.mlr.press/v235/masserano24a.html}, abstract = {An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier’s receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.} }
Endnote
%0 Conference Paper %T Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference %A Luca Masserano %A Alexander Shen %A Michele Doro %A Tommaso Dorigo %A Rafael Izbicki %A Ann B. Lee %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-masserano24a %I PMLR %P 34987--35012 %U https://proceedings.mlr.press/v235/masserano24a.html %V 235 %X An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier’s receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.
APA
Masserano, L., Shen, A., Doro, M., Dorigo, T., Izbicki, R. & Lee, A.B.. (2024). Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:34987-35012 Available from https://proceedings.mlr.press/v235/masserano24a.html.

Related Material