首页 | 官方网站   微博 | 高级检索  
     

基于多因子粒子群的高维数据特征选择算法
引用本文:林炜星,王宇嘉,陈万芬,梁海娜.基于多因子粒子群的高维数据特征选择算法[J].计算机工程与应用,2021,57(22):199-207.
作者姓名:林炜星  王宇嘉  陈万芬  梁海娜
作者单位:上海工程技术大学 电子电气工程学院,上海 201620
摘    要:特征选择是机器学习和数据挖掘领域中一项重要的数据预处理技术,它旨在最大化分类任务的精度和最小化最优子集特征个数。运用粒子群算法在高维数据集中寻找最优子集面临着陷入局部最优和计算代价昂贵的问题,导致分类精度下降。针对此问题,提出了基于多因子粒子群算法的高维数据特征选择算法。引入了进化多任务的算法框架,提出了一种两任务模型生成的策略,通过任务间的知识迁移加强种群交流,提高种群多样性以改善易陷入局部最优的缺陷;设计了基于稀疏表示的初始化策略,在算法初始阶段设计具有稀疏表示的初始解,降低了种群在趋向最优解集时的计算开销。在6个公开医学高维数据集上的实验结果表明,所提算法能够有效实现分类任务且得到较好的精度。

关 键 词:高维数据  特征选择  进化多任务  粒子群算法(PSO)  

High-Dimensional Data Feature Selection Algorithm Based on Multifactor Particle Swarm Optimization
LIN Weixing,WANG Yujia,CHEN Wanfen,LIANG Haina.High-Dimensional Data Feature Selection Algorithm Based on Multifactor Particle Swarm Optimization[J].Computer Engineering and Applications,2021,57(22):199-207.
Authors:LIN Weixing  WANG Yujia  CHEN Wanfen  LIANG Haina
Affiliation:School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
Abstract:Feature selection is an important data preprocessing technique in the field of machine learning and data mining. It aims to maximize the accuracy of classification tasks and minimize the number of optimal subset features. Using the particle swarm algorithm to find the optimal subset in the high-dimensional dataset is faced with the problems of falling into the local optimum and expensive calculations, resulting in a decrease in classification accuracy. To solve this problem, a high-dimensional data feature selection algorithm based on multifactor particle swarm optimization is proposed. Firstly, the evolutionary multi-task algorithm framework is introduced, and a two-task model generation strategy is proposed, which strengthens population communication through knowledge transfer between tasks and improves population diversity to improve the shortcomings that tend to fall into local optimum. Secondly, the design is based on the initial strategy of sparse representation, the initial solution with sparse representation is designed in the initial stage of the algorithm, which reduces the computational cost of the population when it tends to the optimal solution set. The experimental results on 6 public medical high-dimensional datasets show that the proposed algorithm can effectively achieve the classification task and obtain better accuracy.
Keywords:high-dimensional data  feature selection  evolutionary multitasking  Particle Swarm Optimization(PSO)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号