首页 | 本学科首页   官方微博 | 高级检索  
     


Effective semi-supervised document clustering via active learning with instance-level constraints
Authors:Weizhong Zhao  Qing He  Huifang Ma  Zhongzhi Shi
Affiliation:1. The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
2. Graduate University of Chinese Academy of Sciences, 100039, Beijing, China
3. College of Information Engineering, Xiangtan University, 411105, Xiangtan, China
4. College of Mathematics and Information, Northwest Normal University, 730070, Gansu Lanzhou, China
Abstract:Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters, has received significant interest recently. Because of getting supervised data may be expensive, it is important to get most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm and a new method for actively selecting informative instance-level constraints to get improved clustering performance. The semi- supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN) algorithm, which incorporates instance-level constraints to guide the clustering process in DBSCAN. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that Cons-DBSCAN with our proposed active learning approach can improve the clustering performance significantly when given a relatively small amount of constraints.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号