Stability and model selection in k-means clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Stability and model selection in k-means clustering

Authors:	Ohad Shamir Naftali Tishby

Affiliation:	1. School of Computer Science and Engineering, The Hebrew University, Jerusalem, 91904, Israel 2. School of Computer Science and Engineering, and Interdisciplinary Center for Neural Computation, The Hebrew University, Jerusalem, 91904, Israel

Abstract:	Clustering stability methods are a family of widely used model selection techniques for data clustering. Their unifying theme is that an appropriate model should result in a clustering which is robust with respect to various kinds of perturbations. Despite their relative success, not much is known theoretically on why or when do they work, or even what kind of assumptions they make in choosing an ‘appropriate’ model. Moreover, recent theoretical work has shown that they might ‘break down’ for large enough samples. In this paper, we focus on the behavior of clustering stability using k-means clustering. Our main technical result is an exact characterization of the distribution to which suitably scaled measures of instability converge, based on a sample drawn from any distribution in ?ⁿ satisfying mild regularity conditions. From this, we can show that clustering stability does not ‘break down’ even for arbitrarily large samples, at least for the k-means framework. Moreover, it allows us to identify the factors which eventually determine the behavior of clustering stability. This leads to some basic observations about what kind of assumptions are made when using these methods. While often reasonable, these assumptions might also lead to unexpected consequences.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏