Robust Probabilistic Modeling with Bayesian Data Reweighting

Yixin Wang, Alp Kucukelbir, David M. Blei
; Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3646-3655, 2017.

Abstract

Probabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model’s assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-wang17g, title = {Robust Probabilistic Modeling with {B}ayesian Data Reweighting}, author = {Yixin Wang and Alp Kucukelbir and David M. Blei}, pages = {3646--3655}, year = {2017}, editor = {Doina Precup and Yee Whye Teh}, volume = {70}, series = {Proceedings of Machine Learning Research}, address = {International Convention Centre, Sydney, Australia}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/wang17g/wang17g.pdf}, url = {http://proceedings.mlr.press/v70/wang17g.html}, abstract = {Probabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model’s assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario.} }
Endnote
%0 Conference Paper %T Robust Probabilistic Modeling with Bayesian Data Reweighting %A Yixin Wang %A Alp Kucukelbir %A David M. Blei %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-wang17g %I PMLR %J Proceedings of Machine Learning Research %P 3646--3655 %U http://proceedings.mlr.press %V 70 %W PMLR %X Probabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model’s assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario.
APA
Wang, Y., Kucukelbir, A. & Blei, D.M.. (2017). Robust Probabilistic Modeling with Bayesian Data Reweighting. Proceedings of the 34th International Conference on Machine Learning, in PMLR 70:3646-3655

Related Material