首页 | 本学科首页   官方微博 | 高级检索  
     

基于连通分量的分类变量聚类算法
引用本文:周红芳,周扬,张晓鹏,谈姝辰.基于连通分量的分类变量聚类算法[J].控制与决策,2015,30(1):39-45.
作者姓名:周红芳  周扬  张晓鹏  谈姝辰
作者单位:1. 西安理工大学计算机科学与工程学院,西安710048;
2. 陕西应用物理化学研究所,西安710061.
基金项目:国家自然科学基金项目(61402363,61272284);陕西省工业攻关项目(2014K05-49);陕西省自然科学基础研究计划项目(2014JQ8361);西安市碑林区科技计划项目(GX1405);西安市科学计划项目(CXY1339(5));校特色研究计划项目(116-211302)
摘    要:针对分类变量相似度定义存在的不足, 提出一种新的相似度定义. 利用新的相似度定义, 将数据集抽象为无向图, 将聚类过程转化为求无向图连通分量的过程, 进而提出一种基于连通分量的分类变量聚类算法. 为了定量地分析该算法的聚类效果, 针对类别归属已知的数据集, 提出一种新的聚类结果评价指标. 实验结果表明, 所提出的算法具有较高的聚类精度和聚类效率.

关 键 词:聚类  分类变量  相似度  连通分量  聚类精度
收稿时间:2013-10-29
修稿时间:2014/3/20 0:00:00

A clustering algorithm for categorical variables based on connected components
ZHOU Hong-fang ZHOU Yang ZHANG Xiao-peng TAN Shu-chen.A clustering algorithm for categorical variables based on connected components[J].Control and Decision,2015,30(1):39-45.
Authors:ZHOU Hong-fang ZHOU Yang ZHANG Xiao-peng TAN Shu-chen
Affiliation:1. School of Computer Science and Engineering,Xi’an University of Technology,Xi’an 710048,China;
2. Shanxi Applied Physics and Chemistry Research Institute,Xi’an 710061,China.
Abstract:For the insufficient similarity concepts for categorical variables, a new more reasonable concept is proposed. Firstly, a data set is organized into an undirected graph by the new definition. The clustering process is converted into the problem of determining connected components in the undirected graph. Then a novel clustering algorithm for categorical variables based on connected components is proposed. In order to analyze the clustering results quantitatively, a new index is proposed for the known labels. Finally, the experimental results show that the proposed algorithm has a higher clustering precision and faster execution speed compared with several existing ones.
Keywords:clustering  categorical variables  similarity  connected components  clustering precision
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号