Non-redundant data clustering |
| |
Authors: | David Gondek Thomas Hofmann |
| |
Affiliation: | (1) Department of Computer Science, Brown University, Providence, RI, USA |
| |
Abstract: | Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this
discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel,
previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck
framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to
constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different
types of numeric and non-numeric attributes. We discuss extensions of the technique to the tasks of semi-supervised classification
and enumeration of successive non-redundant clusterings. We present experimental results for applications in text mining and
computer vision. |
| |
Keywords: | Non-redundant clustering Exploratory data mining Information bottleneck |
本文献已被 SpringerLink 等数据库收录! |
|