Effective semi-supervised document clustering via active learning with instance-level constraints |
| |
Authors: | Weizhong Zhao Qing He Huifang Ma Zhongzhi Shi |
| |
Affiliation: | 1. The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China 2. Graduate University of Chinese Academy of Sciences, 100039, Beijing, China 3. College of Information Engineering, Xiangtan University, 411105, Xiangtan, China 4. College of Mathematics and Information, Northwest Normal University, 730070, Gansu Lanzhou, China
|
| |
Abstract: | Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters,
has received significant interest recently. Because of getting supervised data may be expensive, it is important to get most
informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm
and a new method for actively selecting informative instance-level constraints to get improved clustering performance. The
semi- supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN) algorithm, which incorporates instance-level
constraints to guide the clustering process in DBSCAN. An active learning approach is proposed to select informative document
pairs for obtaining user feedbacks. Experimental results show that Cons-DBSCAN with our proposed active learning approach
can improve the clustering performance significantly when given a relatively small amount of constraints. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|