Novelty detection: Unlabeled data definitely help

Clayton Scott; Gilles Blanchard

Novelty detection: Unlabeled data definitely help

Clayton Scott, Gilles Blanchard

Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:464-471, 2009.

Abstract

In machine learning, one formulation of the novelty detection problem is to build a detector based on a training sample consisting of only nominal data. The standard (inductive) approach to this problem has been to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, our approach yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-scott09a,
  title = 	 {Novelty detection: Unlabeled data definitely help},
  author = 	 {Scott, Clayton and Blanchard, Gilles},
  booktitle = 	 {Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {464--471},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/scott09a/scott09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/scott09a.html},
  abstract = 	 {In machine learning, one formulation of the novelty detection problem is  to build a detector based on a training sample consisting of only nominal  data. The standard (inductive) approach to this problem has been to  declare novelties where the nominal density is low, which reduces the  problem to density level set estimation. In this paper, we consider the  setting where an unlabeled and possibly contaminated sample is also  available at learning time. We argue that novelty detection is naturally  solved by a general reduction to a binary classification problem. In  particular, a detector with a desired false positive rate can be achieved  through a reduction to Neyman-Pearson classification. Unlike the inductive  approach, our approach yields detectors that are optimal (e.g.,  statistically consistent) regardless of the distribution on novelties.  Therefore, in novelty detection, unlabeled data have a substantial impact  on the theoretical properties of the decision rule.}
}

Endnote

%0 Conference Paper
%T Novelty detection: Unlabeled data definitely help
%A Clayton Scott
%A Gilles Blanchard
%B Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-scott09a
%I PMLR
%P 464--471
%U https://proceedings.mlr.press/v5/scott09a.html
%V 5
%X In machine learning, one formulation of the novelty detection problem is  to build a detector based on a training sample consisting of only nominal  data. The standard (inductive) approach to this problem has been to  declare novelties where the nominal density is low, which reduces the  problem to density level set estimation. In this paper, we consider the  setting where an unlabeled and possibly contaminated sample is also  available at learning time. We argue that novelty detection is naturally  solved by a general reduction to a binary classification problem. In  particular, a detector with a desired false positive rate can be achieved  through a reduction to Neyman-Pearson classification. Unlike the inductive  approach, our approach yields detectors that are optimal (e.g.,  statistically consistent) regardless of the distribution on novelties.  Therefore, in novelty detection, unlabeled data have a substantial impact  on the theoretical properties of the decision rule.

RIS


TY  - CPAPER
TI  - Novelty detection: Unlabeled data definitely help
AU  - Clayton Scott
AU  - Gilles Blanchard
BT  - Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-scott09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 464
EP  - 471
L1  - http://proceedings.mlr.press/v5/scott09a/scott09a.pdf
UR  - https://proceedings.mlr.press/v5/scott09a.html
AB  - In machine learning, one formulation of the novelty detection problem is  to build a detector based on a training sample consisting of only nominal  data. The standard (inductive) approach to this problem has been to  declare novelties where the nominal density is low, which reduces the  problem to density level set estimation. In this paper, we consider the  setting where an unlabeled and possibly contaminated sample is also  available at learning time. We argue that novelty detection is naturally  solved by a general reduction to a binary classification problem. In  particular, a detector with a desired false positive rate can be achieved  through a reduction to Neyman-Pearson classification. Unlike the inductive  approach, our approach yields detectors that are optimal (e.g.,  statistically consistent) regardless of the distribution on novelties.  Therefore, in novelty detection, unlabeled data have a substantial impact  on the theoretical properties of the decision rule.
ER  -

APA


Scott, C. & Blanchard, G.. (2009). Novelty detection: Unlabeled data definitely help. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:464-471 Available from https://proceedings.mlr.press/v5/scott09a.html.

Related Material

Download PDF