首页 | 本学科首页   官方微博 | 高级检索  
     

抽样改进加权核大数据谱聚类算法
引用本文:申锐,吴睿.抽样改进加权核大数据谱聚类算法[J].机械设计与制造,2021(1):171-174.
作者姓名:申锐  吴睿
作者单位:山西交通职业技术学院,山西 晋中 030600;西安交通大学软件学院,陕西 西安 710061
摘    要:经典谱聚类算法将数据聚类转为图划分问题,在分析其Normalized Cut函数与传统加权核k-means等价基础上,设计了一种基于抽样改进加权核k-means算法的大规模数据集谱聚类算法,算法通过加权核k-means迭代优化避免Laplacian矩阵特征分解的大量资源占用,通过随机映射得到近似奇异值分解,并由近似奇异向量确定各点数据权重及抽样概率,以此得到快速合理抽样,通过数据抽样并将聚类中心约束到抽样点生成的子空间中,避免全部核矩阵的使用,从而降低经典算法的时间空间复杂度。实验结果表明,改进算法在保持与经典算法相近精度基础上,大幅提高了聚类效率,实验验证了改进算法的有效性。

关 键 词:大数据谱聚类  加权核k-means算法  数据抽样  矩阵特征分解  核矩阵

Large Scal Spectral Clustering Based on Sampling Improved Weighted Kernel
SHEN Rui,WU Rui.Large Scal Spectral Clustering Based on Sampling Improved Weighted Kernel[J].Machinery Design & Manufacture,2021(1):171-174.
Authors:SHEN Rui  WU Rui
Affiliation:(Shanxi Traffical and Technical College,Shanxi Jinzhong030600,China;Xi’an Jiaotong University,Shaanxi Xi’an710061,China)
Abstract:Classical spectral clustering algorithm transforms data clustering into graph partitioning problems,so based on analyzing the equivalence between its Normalized Cut objective function and the weighted nuclear k-means function,a largescale data spectrum based on sampling improved weighted nuclear k-means algorithm is designed,in which,the weighted kernel k-means iterative optimization is used to avoid the large resource consumption of Laplacian matrix feature decomposition,and the use of all nuclear matrices is avoided by through the data sampling and constrain the clustering center to the subspace generated by the sampling point,thereby reducing the time-space complexity of classical algorithms.Theoretical analysis and experimental results show that,the improved algorithm can greatly improve the clustering efficiency on the basis of maintaining similar clustering accuracy with the classic algorithm.
Keywords:Spectral Clustering  Weighted Kernel k-means  Data Sampling  Matrix Feature Decomposition  Kernel Matrix
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号