Decontamination of Mutually Contaminated Models

Gilles Blanchard; Clayton Scott

Decontamination of Mutually Contaminated Models

Gilles Blanchard, Clayton Scott

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:1-9, 2014.

Abstract

A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base distributions, or otherwise discern their properties. This work focuses on the problem of classification with multiclass label noise, in a general setting where the noise proportions are unknown and the true class distributions are nonseparable and potentially quite complex. We develop a procedure for decontamination of the contaminated models from data, which then facilitates the design of a consistent discrimination rule. Our approach relies on a novel method for estimating the error when projecting one distribution onto a convex combination of others, where the projection is with respect to an information divergence known as the separation distance. Under sufficient conditions on the amount of noise and purity of the base distributions, this projection procedure successfully recovers the underlying class distributions. Connections to novelty detection, topic modeling, and other learning problems are also discussed.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-blanchard14,
  title = 	 {{Decontamination of Mutually Contaminated Models}},
  author = 	 {Blanchard, Gilles and Scott, Clayton},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1--9},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/blanchard14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/blanchard14.html},
  abstract = 	 {A variety of machine learning problems are characterized by   data sets that are drawn from multiple different convex   combinations of a fixed set of base distributions.  We call this a mutual contamination model.  In such problems, it is often of interest to   recover these base distributions, or otherwise discern   their properties. This work focuses on the problem of   classification with multiclass label noise, in a general   setting where the noise proportions are unknown and the   true class distributions are nonseparable and potentially   quite complex. We develop a procedure for decontamination   of the contaminated models from data, which then   facilitates the design of a consistent discrimination rule. Our   approach relies on a novel method for estimating the error   when projecting one distribution onto a convex combination   of others, where the projection is with respect to an   information divergence known as the separation distance.   Under sufficient conditions on the amount of noise and   purity of the base distributions, this projection procedure   successfully recovers the underlying class distributions. Connections to   novelty detection, topic modeling, and other learning problems are also   discussed.}
}

Endnote

%0 Conference Paper
%T Decontamination of Mutually Contaminated Models
%A Gilles Blanchard
%A Clayton Scott
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-blanchard14
%I PMLR
%P 1--9
%U https://proceedings.mlr.press/v33/blanchard14.html
%V 33
%X A variety of machine learning problems are characterized by   data sets that are drawn from multiple different convex   combinations of a fixed set of base distributions.  We call this a mutual contamination model.  In such problems, it is often of interest to   recover these base distributions, or otherwise discern   their properties. This work focuses on the problem of   classification with multiclass label noise, in a general   setting where the noise proportions are unknown and the   true class distributions are nonseparable and potentially   quite complex. We develop a procedure for decontamination   of the contaminated models from data, which then   facilitates the design of a consistent discrimination rule. Our   approach relies on a novel method for estimating the error   when projecting one distribution onto a convex combination   of others, where the projection is with respect to an   information divergence known as the separation distance.   Under sufficient conditions on the amount of noise and   purity of the base distributions, this projection procedure   successfully recovers the underlying class distributions. Connections to   novelty detection, topic modeling, and other learning problems are also   discussed.

RIS


TY  - CPAPER
TI  - Decontamination of Mutually Contaminated Models
AU  - Gilles Blanchard
AU  - Clayton Scott
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-blanchard14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 1
EP  - 9
L1  - http://proceedings.mlr.press/v33/blanchard14.pdf
UR  - https://proceedings.mlr.press/v33/blanchard14.html
AB  - A variety of machine learning problems are characterized by   data sets that are drawn from multiple different convex   combinations of a fixed set of base distributions.  We call this a mutual contamination model.  In such problems, it is often of interest to   recover these base distributions, or otherwise discern   their properties. This work focuses on the problem of   classification with multiclass label noise, in a general   setting where the noise proportions are unknown and the   true class distributions are nonseparable and potentially   quite complex. We develop a procedure for decontamination   of the contaminated models from data, which then   facilitates the design of a consistent discrimination rule. Our   approach relies on a novel method for estimating the error   when projecting one distribution onto a convex combination   of others, where the projection is with respect to an   information divergence known as the separation distance.   Under sufficient conditions on the amount of noise and   purity of the base distributions, this projection procedure   successfully recovers the underlying class distributions. Connections to   novelty detection, topic modeling, and other learning problems are also   discussed.
ER  -

APA


Blanchard, G. & Scott, C.. (2014). Decontamination of Mutually Contaminated Models. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:1-9 Available from https://proceedings.mlr.press/v33/blanchard14.html.

Decontamination of Mutually Contaminated Models

Abstract

Cite this Paper

Related Material