首页 | 本学科首页   官方微博 | 高级检索  
     

基于互信息的组合特征选择算法
引用本文:李叶紫,周怡璐,王振友. 基于互信息的组合特征选择算法[J]. 计算机系统应用, 2017, 26(8): 173-179
作者姓名:李叶紫  周怡璐  王振友
作者单位:广东工业大学 应用数学学院, 广州 510006,广东工业大学 应用数学学院, 广州 510006,广东工业大学 应用数学学院, 广州 510006
基金项目:国家自然科学基金(11401115)
摘    要:对候选特征进行降维在机器学习领域,如分类、聚类问题中占有很重要的地位.现有的方法大多数是基于单一特征对目标T的依赖性或特征与特征之间对Y影响的关联性,互补性和冗余性进行特征选择.然而这些方法几乎都没有考虑到组合特征,如属性A,B仅包含Y中的极少量信息,甚至与Y完全独立,但A&B能提供关于Y的大量信息,甚至完全决定Y.基于此,提出了一种能够从特征集合中挖掘到组合特征与单一特征的特征选择算法,首先对不显著特征进行组合并按照条件概率分布表生成新的候选特征;然后,对单一特征和组合特征利用基于最大相关性和最小冗余度的准则进行选择.最后分别在虚拟和真实数据集上进行实验,实验结果表明该特征选择算法能够较好的挖掘数据集的组合特征信息,一定程度上提高了相应的机器学习算法的准确率.

关 键 词:组合特征  特征选择  最大相关性  最小冗余度
收稿时间:2016-12-05

Combined Feature Selection Algorithm Based on Mutual Information
LI Ye-Zi,ZHOU Yi-Lu and WANG Zhen-You. Combined Feature Selection Algorithm Based on Mutual Information[J]. Computer Systems& Applications, 2017, 26(8): 173-179
Authors:LI Ye-Zi  ZHOU Yi-Lu  WANG Zhen-You
Affiliation:Department of Applied mathematics, Guangdong University of Technology, Guangzhou 510520, China,Department of Applied mathematics, Guangdong University of Technology, Guangzhou 510520, China and Department of Applied mathematics, Guangdong University of Technology, Guangzhou 510520, China
Abstract:It is very important to reduce the candidate features in the machine learning such as classification and clustering. Most of the existing methods are based on a single feature on the target T or the association between the feature and the feature on the Y. However, these methods do not take into the combined features, such as attributes A, B contains a little amount of information in Y, and even completely independent of Y, but A & B can provide information on Y lot of information, or even completely determine the Y. Based on this, we can extract an algorithm to find single and combined features from the feature set, firstly combination of non-significant features in accordance with the conditional probability distribution table to generate new candidate features Then, the single feature and the combined features are chosen based on the criterion of the maximum correlation and the minimum redundancy. Finally, the experiment is carried out on the virtual and real data sets respectively, and the experimental results show that the feature selection algorithm can mine the dataset better, Which improves the accuracy of the corresponding machine learning algorithm to a certain extent.
Keywords:combined feature  feature selection  max-relevance  min-redundancy
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号