首页 | 本学科首页   官方微博 | 高级检索  
     

针对混合型分类数据改进的[K]-modes算法距离公式
引用本文:袁方,杨有龙. 针对混合型分类数据改进的[K]-modes算法距离公式[J]. 计算机工程与应用, 2020, 56(6): 186-193. DOI: 10.3778/j.issn.1002-8331.1901-0423
作者姓名:袁方  杨有龙
作者单位:西安电子科技大学 数学与统计学院,西安 710126
摘    要:传统[K]-modes算法在分类属性聚类中有着广泛的应用,但是传统算法并不区分有序分类属性与无序分类属性。在区分这两种属性的基础上,提出了一种新的距离公式,并优化了算法流程。基于无序分类属性的距离数值,确定了有序分类属性相邻属性值之间距离数值的合理范围。借助有序分类属性蕴含的顺序关系,构建了有序分类属性的距离公式。计算样本点与质心距离之时,引入了簇内各属性值的比例作为总体距离公式的重要参数。综上,新的距离公式良好地刻画了有序分类属性的距离,并且平衡了两种不同分类属性距离公式之间的差异性。实验结果表明,提出的改进算法和距离公式在UCI真实数据集上比原始[K]-modes算法及其改进算法均有显著的效果。

关 键 词:[K]-modes算法  有序分类属性  混合型数据  混合型数据距离公式  

Improved Distance Formula of K-modes Clustering Algorithm for Mixed Categorical Attribute Data
YUAN Fang,YANG Youlong. Improved Distance Formula of K-modes Clustering Algorithm for Mixed Categorical Attribute Data[J]. Computer Engineering and Applications, 2020, 56(6): 186-193. DOI: 10.3778/j.issn.1002-8331.1901-0423
Authors:YUAN Fang  YANG Youlong
Affiliation:School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
Abstract:Traditional K-modes algorithm is widely used in categorical attribute clustering,but traditional algorithms do not distinguish ordinal categorical attribute and disordered categorical attribute.On the basis of distinguishing the two attributes,a new distance formula is proposed and the algorithm flow is optimized.The reasonable range of the distance between two adjacent attribute value of ordinal categorical attribute is determined by the distance value of the disordered categorical attributes.Based on the sequential relationship of the ordinal categorical attributes,the distance formula of ordinal categorical attribute is constructed.The proportion of each attribute value in the cluster is introduced as the distance parameter to calculate the distance between the data points and the centroid.The new distance formula describes the distance of ordinal attributes well,and balances the difference between the distance formulas of two different categorical attributes.The experimental results show that the improved algorithm and distance formula proposed in this paper is more effective than the original K-modes algorithm and its improved algorithm on UCI real data sets.
Keywords:K-modes algorithm  ordinal attribute  mixed-type data  distance formula of mixed type data
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号