首页 | 本学科首页   官方微博 | 高级检索  
     

基于Tri-Training和数据剪辑的半监督聚类算法
引用本文:邓 超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法[J].软件学报,2008,19(3):663-673.
作者姓名:邓 超  郭茂祖
作者单位:哈尔滨工业大学,计算机科学与技术学院,黑龙江哈尔滨,150001
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60702033,60772076(国家自然科学基金);the National High-Tech Researth and Development Plan of China under Grant No.2007AA012171(国家高技术研究发展计划(863));the Science Fund for Distinguished Young Scholars of Heilongjiang Province of China under Grant No.JC200611(黑龙江省杰出青年科学基金);the Natural Science Foundation of Heilongjiaag Province of China under Grant No.ZJG0705(黑龙江省自然科学重点基金);the Foundation of Harbin Institute of Technology of China under Grant No.HIT.2003.53(哈尔滨工业大学校基金)
摘    要:提出一种半监督聚类算法,该算法在用seeds集初始化聚类中心前,利用半监督分类方法Tri-training的迭代训练过程对无标记数据进行标记,并加入seeds集以扩大规模;同时,在Tri-training训练过程中结合基于最近邻规则的Depuration数据剪辑技术对seeds集扩大过程中产生的误标记噪声数据进行修正、净化,以提高seeds集质量.实验结果表明,所提出的基于Tri-training和数据剪辑的DE-Tri-training半监督聚类新算法能够有效改善seeds集对聚类中心的初始化效果,提高聚类性能.

关 键 词:半监督聚类  半监督分类  K-均值  seeds集  Tri-Training  Depuration数据剪辑
收稿时间:2006-06-21
修稿时间:3/7/2007 12:00:00 AM

Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm
DENG Chao and GUO Mao-Zu.Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm[J].Journal of Software,2008,19(3):663-673.
Authors:DENG Chao and GUO Mao-Zu
Abstract:In this paper, a algorithm named DE-Tri-training semi-supervised K-means is proposed, which could get a seeds set of larger scale and less noise. In detail, prior to using the seeds set to initialize cluster centroids, the training process of a semi-supervised classification approach named Tri-training is used to label unlabeled data and add them into the initial seeds set to enlarge the scale. Meanwhile, to improve the quality of the enlarged seeds set, a nearest neighbor rule based data editing technique named Depuration is introduced into Tri-training process to eliminate and correct the mislabeled noise data in the enlarged seeds. Experimental results show that the novel semi-supervised clustering algorithm could effectively improve the cluster centroids initialization and enhance clustering performance.
Keywords:semi-supervised clustering  semi-supervised classification  K-means  seeds set  Tri-training  depuration data editing
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号