首页 | 本学科首页   官方微博 | 高级检索  
     

基于自步学习的无监督属性选择算法
引用本文:龚永红,郑威,吴林,谭马龙,余浩.基于自步学习的无监督属性选择算法[J].计算机应用,2018,38(10):2856-2861.
作者姓名:龚永红  郑威  吴林  谭马龙  余浩
作者单位:1. 桂林航天工业学院 图书馆, 广西 桂林 541004;2. 广西师范大学 广西多源信息挖掘与安全重点实验室, 广西 桂林 541004
基金项目:国家自然科学基金资助项目(61573270);广西自然科学基金资助项目(2015GXNSFCB139011);广西研究生教育创新计划项目(YCSW2018093)。
摘    要:针对现有属性选择算法平等地对待每个样本而忽略样本之间的差异性,从而使学习模型无法避免噪声样本影响问题,提出一种融合自步学习理论的无监督属性选择(UFS-SPL)算法。首先自动选取一个重要的样本子集训练得到属性选择的鲁棒性初始模型,然后逐步自动引入次要样本提升模型的泛化能力,最终获得一个能避免噪声干扰而同时具有鲁棒性和泛化性的属性选择模型。在真实数据集上与凸半监督多标签属性选择(CSFS)、正则化自表达(RSR)和无监督属性选择的耦合字典学习方法(CDLFS)相比,UFS-SPL的聚类准确率、互信息和纯度平均提升12.06%、10.54%和10.5%。实验结果表明,UFS-SPL能够有效降低数据集中无关信息的影响。

关 键 词:无监督学习  属性选择  自步学习  自表达  稀疏学习  
收稿时间:2018-03-17
修稿时间:2018-04-24

Unsupervised feature selection algorithm based on self-paced learning
GONG Yonghong,ZHENG Wei,WU Lin,TAN Malong,YU Hao.Unsupervised feature selection algorithm based on self-paced learning[J].journal of Computer Applications,2018,38(10):2856-2861.
Authors:GONG Yonghong  ZHENG Wei  WU Lin  TAN Malong  YU Hao
Affiliation:1. Library, Guilin University of Aerospace Technology, Guilin Guangxi 541004, China;2. Guangxi Key Laboratory of Multi-source Information Mining and Security, Guangxi Normal University, Guilin Guangxi 541004, China
Abstract:Concerning that the samples are treated equally and the difference of samples is ignored in the conventional feature selection algorithms, as well as the learning model cannot effectively avoid the influence from the noise samples, an Unsupervised Feature Selection algorithm based on Self-Paced Learning (UFS-SPL) was proposed. Firstly, a sample subset containing important samples for training was selected automatically to construct the initial feature selection model, then more important samples were added gradually into the former model to improve its generalization ability, until a robust and generalized feature selection model was constructed or all samples were selected. Compared with Convex Semi-supervised multi-label Feature Selection (CSFS), Regularized Self-Representation (RSR) and Coupled Dictionary Learning method for unsupervised Feature Selection (CDLFS), the clustering accuracy, normalized mutual information and purity of UFS-SPL were increased by 12.06%, 10.54% and 10.5%, respectively. The experimental results show that UFS-SPL can effectively remove the effect of irrelevant information from original data sets.
Keywords:unsupervised learning                                                                                                                        feature selection                                                                                                                        self-paced learning                                                                                                                        self-representation                                                                                                                        sparse learning
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号