MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering

Sarwan Ali, Prakash Chourasia, Haris Mansoor, Bipin Koirala, Murray Patterson
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:30-47, 2025.

Abstract

The t-Distributed Stochastic Neighbor Embedding (t-SNE) has emerged as a popular dimensionality reduction technique for visualizing high-dimensional data. It computes pairwise similarities between data points by default using an RBF kernel and random initialization (in low-dimensional space), which successfully captures the overall structure but may struggle to preserve the local structure efficiently. This research proposes a novel approach called the Modified Isolation Kernel (MIK) as an alternative to the Gaussian kernel, which is built upon the concept of the Isolation Kernel. MIK uses adaptive density estimation to capture local structures more accurately and integrates robustness measures. It also assigns higher similarity values to nearby points and lower values to distant points. Comparative research using the normal Gaussian kernel, the isolation kernel, and several initialization techniques, including random, PCA, and random walk initializations, are used to assess the proposed approach (MIK). Additionally, we compare the computational efficiency of all $3$ kernels with $3$ different initialization methods. Our experimental results demonstrate several advantages of the proposed kernel (MIK) and initialization method selection. It exhibits improved preservation of the local and global structure and enables better visualization of clusters and subclusters in the embedded space. These findings contribute to advancing dimensionality reduction techniques and provide researchers and practitioners with an effective tool for data exploration, visualization, and analysis in various domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-ali25a, title = {MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering}, author = {Ali, Sarwan and Chourasia, Prakash and Mansoor, Haris and Koirala, Bipin and Patterson, Murray}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {30--47}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/ali25a/ali25a.pdf}, url = {https://proceedings.mlr.press/v259/ali25a.html}, abstract = {The t-Distributed Stochastic Neighbor Embedding (t-SNE) has emerged as a popular dimensionality reduction technique for visualizing high-dimensional data. It computes pairwise similarities between data points by default using an RBF kernel and random initialization (in low-dimensional space), which successfully captures the overall structure but may struggle to preserve the local structure efficiently. This research proposes a novel approach called the Modified Isolation Kernel (MIK) as an alternative to the Gaussian kernel, which is built upon the concept of the Isolation Kernel. MIK uses adaptive density estimation to capture local structures more accurately and integrates robustness measures. It also assigns higher similarity values to nearby points and lower values to distant points. Comparative research using the normal Gaussian kernel, the isolation kernel, and several initialization techniques, including random, PCA, and random walk initializations, are used to assess the proposed approach (MIK). Additionally, we compare the computational efficiency of all $3$ kernels with $3$ different initialization methods. Our experimental results demonstrate several advantages of the proposed kernel (MIK) and initialization method selection. It exhibits improved preservation of the local and global structure and enables better visualization of clusters and subclusters in the embedded space. These findings contribute to advancing dimensionality reduction techniques and provide researchers and practitioners with an effective tool for data exploration, visualization, and analysis in various domains.} }
Endnote
%0 Conference Paper %T MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering %A Sarwan Ali %A Prakash Chourasia %A Haris Mansoor %A Bipin Koirala %A Murray Patterson %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-ali25a %I PMLR %P 30--47 %U https://proceedings.mlr.press/v259/ali25a.html %V 259 %X The t-Distributed Stochastic Neighbor Embedding (t-SNE) has emerged as a popular dimensionality reduction technique for visualizing high-dimensional data. It computes pairwise similarities between data points by default using an RBF kernel and random initialization (in low-dimensional space), which successfully captures the overall structure but may struggle to preserve the local structure efficiently. This research proposes a novel approach called the Modified Isolation Kernel (MIK) as an alternative to the Gaussian kernel, which is built upon the concept of the Isolation Kernel. MIK uses adaptive density estimation to capture local structures more accurately and integrates robustness measures. It also assigns higher similarity values to nearby points and lower values to distant points. Comparative research using the normal Gaussian kernel, the isolation kernel, and several initialization techniques, including random, PCA, and random walk initializations, are used to assess the proposed approach (MIK). Additionally, we compare the computational efficiency of all $3$ kernels with $3$ different initialization methods. Our experimental results demonstrate several advantages of the proposed kernel (MIK) and initialization method selection. It exhibits improved preservation of the local and global structure and enables better visualization of clusters and subclusters in the embedded space. These findings contribute to advancing dimensionality reduction techniques and provide researchers and practitioners with an effective tool for data exploration, visualization, and analysis in various domains.
APA
Ali, S., Chourasia, P., Mansoor, H., Koirala, B. & Patterson, M.. (2025). MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:30-47 Available from https://proceedings.mlr.press/v259/ali25a.html.

Related Material