首页 | 本学科首页   官方微博 | 高级检索  
     

结合改进密度峰值聚类和共享子空间的协同训练算法
引用本文:吕佳,鲜焱.结合改进密度峰值聚类和共享子空间的协同训练算法[J].计算机应用,2021,41(3):686-693.
作者姓名:吕佳  鲜焱
作者单位:1. 重庆师范大学 计算机与信息科学学院, 重庆 401331;2. 重庆师范大学 重庆市数字农业服务工程技术研究中心, 重庆 401331
基金项目:重庆市研究生科研创新项目;国家自然科学基金重大项目;重庆市高校创新研究群体项目
摘    要:针对协同训练算法在迭代过程中加入的无标记样本的有用信息不足和多分类器对样本标记不一致导致的分类错误累积问题,提出结合改进密度峰值聚类和共享子空间的协同训练算法。该算法先采取属性集合互补的方式得到两个基分类器,然后基于虹吸平衡法则进行改进密度峰值聚类,并从簇中心出发来推进式选择相互邻近度高的无标记样本交由两个基分类器进行分类,最后利用多视图非负矩阵分解算法得到的共享子空间来确定标记不一致样本的最终类别。该算法利用改进密度峰值聚类和相互邻近度选择出更具空间结构代表性的无标记样本,并采用共享子空间来修订标记不一致的样本,解决了因样本误分类造成的分类精度低的问题。在9个UCI数据集上的多组对比实验证明了该算法的有效性,实验结果表明所提算法相较于对比算法在7个数据集上取得最高的分类正确率,在另2个数据集取得次高的分类正确率。

关 键 词:协同训练  密度峰值聚类  虹吸平衡法则  共享子空间  相互邻近度  
收稿时间:2020-07-24
修稿时间:2020-10-06

Co-training algorithm combining improved density peak clustering and shared subspace
LYU Jia,XIAN Yan.Co-training algorithm combining improved density peak clustering and shared subspace[J].journal of Computer Applications,2021,41(3):686-693.
Authors:LYU Jia  XIAN Yan
Affiliation:1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;2. Chongqing Center of Engineering Technology Research on Digital Agriculture Service, Chongqing Normal University, Chongqing 401331, China
Abstract:There would be lack of useful information in added unlabeled samples during the iterations of co-training algorithm, meanwhile, the labels of the samples labeled by multiple classifiers may happen to be inconsistent, which would lead to accumulation of classification errors. To solve the above problems, a co-training algorithm combining improved density peak clustering and shared subspace was proposed. Firstly, the two base classifiers were obtained by the complementation of attribute sets. Secondly, an improved density peak clustering was performed based on the siphon balance rule. And beginning from the cluster centers, the unlabeled samples with high mutual neighbor degrees were selected in a progressive manner, then they were labeled by the two base classifiers. Finally, the final categories of the samples with inconsistent labels were determined by the shared subspace obtained by the multi-view non-negative matrix factorization algorithm. In the proposed algorithm, the unlabeled samples with better representation of spatial structure were selected by the improved density peak clustering and mutual neighbor degree, and the same sample labeled by different labels was revised via shared subspace, solving the low classification accuracy problem caused by sample misclassification. The algorithm was validated by comparisons in multiple experiments on 9 UCI datasets, and experimental results show that the proposed algorithm has the highest classification accuracy rate in 7 data sets, and the second highest classification accuracy rate in the other 2 data sets.
Keywords:co-training  density peak clustering  siphon balance rule  shared subspace  mutual neighbor degree  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号