首页 | 本学科首页   官方微博 | 高级检索  
     

结合限制的分隔模型及K-Means算法
引用本文:何振峰,熊范纶.结合限制的分隔模型及K-Means算法[J].软件学报,2005,16(5):799-809.
作者姓名:何振峰  熊范纶
作者单位:中国科学技术大学,自动化系,安徽,合肥,230027;中国科学院,合肥智能机械研究所,安徽,合肥,230031;中国科学院,合肥智能机械研究所,安徽,合肥,230031
基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant No.2002AA243031(国家高技术研究发展计划(863))
摘    要:将数据对象间的关联限制与K-means算法结合可以取得较好的效果,但由于划分是由K个中心决定的,每一类仅由一个中心决定,分隔的表示方法限制了算法效果的进一步提高.基于数据对象间的两类限制,定义了数据对象和集合间的两类关联,以及集合间的3类关联,在此基础上给出了结合限制的分隔模型.在模型中,基于集合间的正关联,多个子集中心可以用来表示同一类,使划分的表示可以更为灵活、精细.基于此模型,给出了相应的算法CKS(constrained K-meanswith subsets)来生成结合限制的分隔.对3个UCI数据集的实验结果显示:在准确率及健壮性上,CKS显著优于另一个结合关联限制的K-means类算法COP-K-means,与另一个代表性的算法CCL相比,也有相当优势;在时间代价上,CKS也有一定优势.

关 键 词:聚类分析  限制聚类  半监督学习  背景知识  机器学习
文章编号:1000-9825/2005/16(05)0799
收稿时间:1/9/2004 12:00:00 AM
修稿时间:2004年1月9日

A Constrained Partition Model and K-Means Algorithm
HE Zhen-Feng and XIONG Fan-Lun.A Constrained Partition Model and K-Means Algorithm[J].Journal of Software,2005,16(5):799-809.
Authors:HE Zhen-Feng and XIONG Fan-Lun
Abstract:Incorporating instance-level constraints into K-means algorithm can improve the accuracy of clustering. As the partition generated is represented by K centers and a cluster is represented by only one center, the representation model prevents further improvement of the accuracy. Based upon the instance-level constraints, two types of constraints between instance and class are presented, three types of constraints between classes are presented too, and the constrained partition model is presented and analyzed. In this model, based upon the constraints between sub-clusters, more centers are utilized to represent one cluster, which makes the representation of partition flexible and precise. An algorithm CKS (constrained K-means with subsets) is presented to generate the constrained partition. The experiments on three UCI datasets: Glass, Iris and Sonar, suggest that CKS is remarkably superior to COP-K-means in accuracy and robustness, and is better than CCL too. The time for running CKS is neither significantly influenced by the number of constraints compared with COP-K-means, nor remarkably increased when the number of instances is increased compared with CCL.
Keywords:clustering analysis  constrained clustering  semi-supervised learning  background knowledge  machine learning
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号