Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning

Dayou Yu, Weishi Shi, Qi Yu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40321-40338, 2023.

Abstract

We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor. We theoretically analyze the change of a hinge loss in the dual form and provide both the upper and lower bounds that are deeply connected to the key geometric properties induced by the dual space, which then help us identify various types of important data samples for AL. These bounds inform the design of a novel sampling strategy that leverages class-wise evidence as a key vehicle, formed through an affine combination of dual variables and kernel evaluation. We construct two distinct types of sampling functions, including discovery and ranking. The former focuses on samples with low total evidence from all classes, which signifies their potential to support exploration; the latter exploits the current decision boundary to identify the most conflicting regions for sampling, aiming to further refine the decision boundary. These two functions, which are complementary to each other, are automatically arranged into a two-phase active sampling process that starts with the discovery and then transitions to the ranking of data points to most effectively balance exploration and exploitation. Experiments on various real-world data demonstrate the state-of-the-art AL performance achieved by our model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-yu23d, title = {Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning}, author = {Yu, Dayou and Shi, Weishi and Yu, Qi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {40321--40338}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/yu23d/yu23d.pdf}, url = {https://proceedings.mlr.press/v202/yu23d.html}, abstract = {We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor. We theoretically analyze the change of a hinge loss in the dual form and provide both the upper and lower bounds that are deeply connected to the key geometric properties induced by the dual space, which then help us identify various types of important data samples for AL. These bounds inform the design of a novel sampling strategy that leverages class-wise evidence as a key vehicle, formed through an affine combination of dual variables and kernel evaluation. We construct two distinct types of sampling functions, including discovery and ranking. The former focuses on samples with low total evidence from all classes, which signifies their potential to support exploration; the latter exploits the current decision boundary to identify the most conflicting regions for sampling, aiming to further refine the decision boundary. These two functions, which are complementary to each other, are automatically arranged into a two-phase active sampling process that starts with the discovery and then transitions to the ranking of data points to most effectively balance exploration and exploitation. Experiments on various real-world data demonstrate the state-of-the-art AL performance achieved by our model.} }
Endnote
%0 Conference Paper %T Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning %A Dayou Yu %A Weishi Shi %A Qi Yu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-yu23d %I PMLR %P 40321--40338 %U https://proceedings.mlr.press/v202/yu23d.html %V 202 %X We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor. We theoretically analyze the change of a hinge loss in the dual form and provide both the upper and lower bounds that are deeply connected to the key geometric properties induced by the dual space, which then help us identify various types of important data samples for AL. These bounds inform the design of a novel sampling strategy that leverages class-wise evidence as a key vehicle, formed through an affine combination of dual variables and kernel evaluation. We construct two distinct types of sampling functions, including discovery and ranking. The former focuses on samples with low total evidence from all classes, which signifies their potential to support exploration; the latter exploits the current decision boundary to identify the most conflicting regions for sampling, aiming to further refine the decision boundary. These two functions, which are complementary to each other, are automatically arranged into a two-phase active sampling process that starts with the discovery and then transitions to the ranking of data points to most effectively balance exploration and exploitation. Experiments on various real-world data demonstrate the state-of-the-art AL performance achieved by our model.
APA
Yu, D., Shi, W. & Yu, Q.. (2023). Discover-Then-Rank Unlabeled Support Vectors in the Dual Space for Multi-Class Active Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:40321-40338 Available from https://proceedings.mlr.press/v202/yu23d.html.

Related Material