Open Category Detection with PAC Guarantees

Si Liu, Risheek Garrepalli, Thomas Dietterich, Alan Fern, Dan Hendrycks
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3169-3178, 2018.

Abstract

Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated” training set that contains a fraction alpha of alien examples. Under the assumption that we know an upper bound on alpha we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-liu18e, title = {Open Category Detection with {PAC} Guarantees}, author = {Liu, Si and Garrepalli, Risheek and Dietterich, Thomas and Fern, Alan and Hendrycks, Dan}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3169--3178}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/liu18e/liu18e.pdf}, url = {https://proceedings.mlr.press/v80/liu18e.html}, abstract = {Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated” training set that contains a fraction alpha of alien examples. Under the assumption that we know an upper bound on alpha we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.} }
Endnote
%0 Conference Paper %T Open Category Detection with PAC Guarantees %A Si Liu %A Risheek Garrepalli %A Thomas Dietterich %A Alan Fern %A Dan Hendrycks %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-liu18e %I PMLR %P 3169--3178 %U https://proceedings.mlr.press/v80/liu18e.html %V 80 %X Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated” training set that contains a fraction alpha of alien examples. Under the assumption that we know an upper bound on alpha we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.
APA
Liu, S., Garrepalli, R., Dietterich, T., Fern, A. & Hendrycks, D.. (2018). Open Category Detection with PAC Guarantees. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3169-3178 Available from https://proceedings.mlr.press/v80/liu18e.html.

Related Material