Conceptual Clustering with Numeric-and-Nominal Mixed Data - A New Similarity Based System

Cen Li, Gautam Biswas
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:327-346, 1997.

Abstract

This paper presents a new Similarity Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon feature-value matches in similarity computations and makes no assumptions of the underlying distributions of the feature-values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a concept tree, and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other schemes illustrate the superior performance of the algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-li97a, title = {Conceptual Clustering with Numeric-and-Nominal Mixed Data - A New Similarity Based System}, author = {Li, Cen and Biswas, Gautam}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {327--346}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/li97a/li97a.pdf}, url = {https://proceedings.mlr.press/r1/li97a.html}, abstract = {This paper presents a new Similarity Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon feature-value matches in similarity computations and makes no assumptions of the underlying distributions of the feature-values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a concept tree, and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other schemes illustrate the superior performance of the algorithm.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T Conceptual Clustering with Numeric-and-Nominal Mixed Data - A New Similarity Based System %A Cen Li %A Gautam Biswas %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-li97a %I PMLR %P 327--346 %U https://proceedings.mlr.press/r1/li97a.html %V R1 %X This paper presents a new Similarity Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon feature-value matches in similarity computations and makes no assumptions of the underlying distributions of the feature-values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a concept tree, and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other schemes illustrate the superior performance of the algorithm. %Z Reissued by PMLR on 30 March 2021.
APA
Li, C. & Biswas, G.. (1997). Conceptual Clustering with Numeric-and-Nominal Mixed Data - A New Similarity Based System. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:327-346 Available from https://proceedings.mlr.press/r1/li97a.html. Reissued by PMLR on 30 March 2021.

Related Material