A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles

Authors:	Dubnov Shlomo El-Yaniv Ran Gdalyahu Yoram Schneidman Elad Tishby Naftali Yona Golan

Affiliation:	(1) Department of Communication Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel;(2) Department of Computer Science, Technion—Israel Institute of Technology, Haifa, 32000, Israel;(3) School of Computer Science and Engineering and Center for Neural Computation, Hebrew University, Jerusalem, 91904, Israel;(4) School of Computer Science and Engineering, Department of Neurobiology and Center for Neural Computation, Hebrew University, Jerusalem, 91904, Israel;(5) Department of Computer Science, Cornell University, Ithaca, NY 14853-7501, USA

Abstract:	We present a novel pairwise clustering method. Given a proximity matrix of pairwise relations (i.e. pairwise similarity or dissimilarity estimates) between data points, our algorithm extracts the two most prominent clusters in the data set. The algorithm, which is completely nonparametric, iteratively employs a two-step transformation on the proximity matrix. The first step of the transformation represents each point by its relation to all other data points, and the second step re-estimates the pairwise distances using a statistically motivated proximity measure on these representations. Using this transformation, the algorithm iteratively partitions the data points, until it finally converges to two clusters. Although the algorithm is simple and intuitive, it generates a complex dynamics of the proximity matrices. Based on this bipartition procedure we devise a hierarchical clustering algorithm, which employs the basic bipartition algorithm in a straightforward divisive manner. The hierarchical clustering algorithm copes with the model validation problem using a general cross-validation approach, which may be combined with various hierarchical clustering methods.We further present an experimental study of this algorithm. We examine some of the algorithm's properties and performance on some synthetic and standard data sets. The experiments demonstrate the robustness of the algorithm and indicate that it generates a good clustering partition even when the data is noisy or corrupted.

Keywords:	nonparametric methods pairwise distances hierarchical clustering Jensen-Shannon divergence cross-validation noise robustness
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏