Actively Avoiding Nonsense in Generative Models

Steve Hanneke; Adam Tauman Kalai; Gautam Kamath; Christos Tzamos

Actively Avoiding Nonsense in Generative Models

Steve Hanneke, Adam Tauman Kalai, Gautam Kamath, Christos Tzamos

Proceedings of the 31st Conference On Learning Theory, PMLR 75:209-227, 2018.

Abstract

A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to “model error,” i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We first show that proper learning may require exponentially many queries to the invalidity oracle. We then give an improper distribution learning algorithm that uses only polynomially many queries.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-hanneke18a,
  title = 	 {Actively Avoiding Nonsense in Generative Models},
  author =       {Hanneke, Steve and Kalai, Adam Tauman and Kamath, Gautam and Tzamos, Christos},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {209--227},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/hanneke18a/hanneke18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/hanneke18a.html},
  abstract = 	 {A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to “model error,” i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We first show that proper learning may require exponentially many queries to the invalidity oracle. We then give an improper distribution learning algorithm that uses only polynomially many queries. }
}

Endnote

%0 Conference Paper
%T Actively Avoiding Nonsense in Generative Models
%A Steve Hanneke
%A Adam Tauman Kalai
%A Gautam Kamath
%A Christos Tzamos
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-hanneke18a
%I PMLR
%P 209--227
%U https://proceedings.mlr.press/v75/hanneke18a.html
%V 75
%X A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to “model error,” i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We first show that proper learning may require exponentially many queries to the invalidity oracle. We then give an improper distribution learning algorithm that uses only polynomially many queries.

APA


Hanneke, S., Kalai, A.T., Kamath, G. & Tzamos, C.. (2018). Actively Avoiding Nonsense in Generative Models. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:209-227 Available from https://proceedings.mlr.press/v75/hanneke18a.html.

Related Material

Download PDF