Unsupervised learning with mixed numeric and nominal data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Unsupervised learning with mixed numeric and nominal data

Authors:	Li C. Biswas G.

Affiliation:	Dept. of Comput. Sci., Middle Tennessee State Univ., Murfreesboro, TN;

Abstract:	Presents a similarity-based agglomerative clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure proposed by D.W. Goodall (1966) for biological taxonomy, that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions about the underlying distributions of the feature values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a dendrogram, and a simple distinctness heuristic is used to extract a partition of the data. The performance of the SBAC algorithm has been studied on real and artificially-generated data sets. The results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other clustering schemes illustrate the superior performance of this approach

Keywords: