Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters

Lixin Yuan, Yirui Wu, Wenxiao Zhang, Minglei Yuan, Jun Liu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:73683-73700, 2025.

Abstract

For data with intra-class Asymmetric instance Distribution or Multiple High-density Clusters (ADMHC), outliers are real and have specific patterns for data classification, where the class body is necessary and difficult to identify. Previous Feature Selection (FS) methods score features based on all training instances or rarely target intra-class ADMHC. In this paper, we propose a supervised FS method, Stray Intrusive Outliers-based FS (SIOFS), for data classification with intra-class ADMHC. By focusing on Stray Intrusive Outliers (SIOs), SIOFS modifies the skewness coefficient and fuses the threshold in the 3$\sigma$ principle to identify the class body, scoring features based on the intrusion degree of SIOs. In addition, the refined density-mean center is proposed to represent the general characteristics of the class body reasonably. Mathematical formulations, proofs, and logical exposition ensure the rationality and universality of the settings in the proposed SIOFS method. Extensive experiments on 16 diverse benchmark datasets demonstrate the superiority of SIOFS over 12 state-of-the-art FS methods in terms of classification accuracy, normalized mutual information, and confusion matrix. SIOFS source codes is available at https://github.com/XXXly/2025-ICML-SIOFS

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yuan25k, title = {Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters}, author = {Yuan, Lixin and Wu, Yirui and Zhang, Wenxiao and Yuan, Minglei and Liu, Jun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {73683--73700}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yuan25k/yuan25k.pdf}, url = {https://proceedings.mlr.press/v267/yuan25k.html}, abstract = {For data with intra-class Asymmetric instance Distribution or Multiple High-density Clusters (ADMHC), outliers are real and have specific patterns for data classification, where the class body is necessary and difficult to identify. Previous Feature Selection (FS) methods score features based on all training instances or rarely target intra-class ADMHC. In this paper, we propose a supervised FS method, Stray Intrusive Outliers-based FS (SIOFS), for data classification with intra-class ADMHC. By focusing on Stray Intrusive Outliers (SIOs), SIOFS modifies the skewness coefficient and fuses the threshold in the 3$\sigma$ principle to identify the class body, scoring features based on the intrusion degree of SIOs. In addition, the refined density-mean center is proposed to represent the general characteristics of the class body reasonably. Mathematical formulations, proofs, and logical exposition ensure the rationality and universality of the settings in the proposed SIOFS method. Extensive experiments on 16 diverse benchmark datasets demonstrate the superiority of SIOFS over 12 state-of-the-art FS methods in terms of classification accuracy, normalized mutual information, and confusion matrix. SIOFS source codes is available at https://github.com/XXXly/2025-ICML-SIOFS} }
Endnote
%0 Conference Paper %T Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters %A Lixin Yuan %A Yirui Wu %A Wenxiao Zhang %A Minglei Yuan %A Jun Liu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yuan25k %I PMLR %P 73683--73700 %U https://proceedings.mlr.press/v267/yuan25k.html %V 267 %X For data with intra-class Asymmetric instance Distribution or Multiple High-density Clusters (ADMHC), outliers are real and have specific patterns for data classification, where the class body is necessary and difficult to identify. Previous Feature Selection (FS) methods score features based on all training instances or rarely target intra-class ADMHC. In this paper, we propose a supervised FS method, Stray Intrusive Outliers-based FS (SIOFS), for data classification with intra-class ADMHC. By focusing on Stray Intrusive Outliers (SIOs), SIOFS modifies the skewness coefficient and fuses the threshold in the 3$\sigma$ principle to identify the class body, scoring features based on the intrusion degree of SIOs. In addition, the refined density-mean center is proposed to represent the general characteristics of the class body reasonably. Mathematical formulations, proofs, and logical exposition ensure the rationality and universality of the settings in the proposed SIOFS method. Extensive experiments on 16 diverse benchmark datasets demonstrate the superiority of SIOFS over 12 state-of-the-art FS methods in terms of classification accuracy, normalized mutual information, and confusion matrix. SIOFS source codes is available at https://github.com/XXXly/2025-ICML-SIOFS
APA
Yuan, L., Wu, Y., Zhang, W., Yuan, M. & Liu, J.. (2025). Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:73683-73700 Available from https://proceedings.mlr.press/v267/yuan25k.html.

Related Material