首页 | 本学科首页   官方微博 | 高级检索  
     

基于耦合度量的多尺度聚类挖掘方法
引用本文:田真真,赵书良,李文斌,张璐璐,陈润资. 基于耦合度量的多尺度聚类挖掘方法[J]. 数据采集与处理, 2020, 35(3): 549-562
作者姓名:田真真  赵书良  李文斌  张璐璐  陈润资
作者单位:河北师范大学计算机与网络空间安全学院,石家庄,050024;河北师范大学河北省供应链大数据分析与数据安全工程研究中心,石家庄,050024;河北师范大学河北省网络与信息安全重点实验室,石家庄,050024;河北地质大学信息工程学院,石家庄,050031;河北师范大学数学科学学院,石家庄,050024
基金项目:国家社会科学基金重大(13&ZD091, 18ZdA200)资助项目。
摘    要:为了能够更好地对非独立同分布的多尺度分类型数据集进行研究,基于无监督耦合度量相似性方法,提出针对非独立同分布的分类属性型数据集的多尺度聚类挖掘算法。首先,对基准尺度数据集进行基于耦合度量的基准尺度聚类;其次,提出基于单链的尺度上推和基于Lanczos核的尺度下推尺度转换算法;最后,利用公用数据集以及H省真实数据集进行实验验证。将耦合度量相似性(Couple metric similarity, CMS)、逆发生频率(Inverse occurrence frequency, IOF)、汉明距离(Hamming distance, HM)等方法与谱聚类结合作为对比算法,结果表明,尺度上推算法与对比算法相比,NMI值平均提高13.1%,MSE值平均减小0.827,F-score值平均提高12.8%;尺度下推算法NMI值平均提高19.2%,MSE值平均减小0.028,F-score值平均提高15.5%。实验结果表明,所提出的算法具有有效性和可行性。

关 键 词:多尺度  聚类  分类数据  尺度转换  度量学习
收稿时间:2019-12-01
修稿时间:2019-12-29

Multi-scale Clustering Mining Method Based on Coupled Metric Similarity
TIAN Zhenzhen,ZHAO Shuliang,LI Wenbin,ZHANG Lulu,CHEN Runzi. Multi-scale Clustering Mining Method Based on Coupled Metric Similarity[J]. Journal of Data Acquisition & Processing, 2020, 35(3): 549-562
Authors:TIAN Zhenzhen  ZHAO Shuliang  LI Wenbin  ZHANG Lulu  CHEN Runzi
Abstract:To better study the non-independent and identically distributed multi-scale categorical data sets, based on the unsupervised coupling measure similarity method, a multi-scale clustering mining algorithm for non-independent and identically distributed classification attribute data sets is proposed. Firstly, the data set of benchmark scale is clustered based on coupled metric similarity method. Secondly, scale conversion algorithms upscaling based on single chain and downscaling based on Lanczos kernel are proposed for scale conversion. Finally, experiments are performed using the public data sets and the real data sets of the H province. In the experiment, couple metric similarity (CMS), inverse occurrence frequency (IOF), hamming distance (HM) and other similarity metric methods combined with spectral clustering algorithm are compared and the experimental results demonstrate that the NMI value of the upscaling increases by 13.1%, the mean of MSE value reduces by 0.827, and the mean of F-score value increases by 12.8%. Compared with other comparison algorithms, the mean of NMI value of downscaling increases by 19.2%, the mean of MSE value reduces by 0.028, and the mean of F-score value increases by 15.5%. Experimental results and theoretical analysis show that the proposed algorithm is effective and feasible.
Keywords:multi-scale  clustering  categorical data  scale conversion  coupled metric similarity
本文献已被 万方数据 等数据库收录!
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号