首页 | 本学科首页   官方微博 | 高级检索  
     

基于归一化模糊联合互信息最大的特征选择
引用本文:董泽民,石 强. 基于归一化模糊联合互信息最大的特征选择[J]. 计算机工程与应用, 2017, 53(22): 105-110. DOI: 10.3778/j.issn.1002-8331.1605-0293
作者姓名:董泽民  石 强
作者单位:1.武汉科技大学 城市学院 实验实训中心,武汉 4300832.华中科技大学 软件学院,武汉 430000
摘    要:特征选择就是从特征集合中选择出与分类类别相关性强而特征之间冗余性最小的特征子集,这样一方面可以提高分类器的计算效率,另一方面可以提高分类器的泛化能力,进而提高分类精度。基于互信息的特征相关性和冗余性的评价准则,在实际应用中存在以下的问题:(1)变量的概率计算困难,进而影响特征的信息熵计算困难;(2)互信息倾向于选择值较多的特征;(3)基于累积加和的候选特征与特征子集之间冗余性度量准则在特征维数较高的情况下容易失效。为了解决上述问题,提出了基于归一化模糊互信息最大的特征评价准则,基于模糊等价关系计算变量的信息熵、条件熵、联合熵;利用联合互信息最大替换累积加和的度量方法;基于归一化联合互信息对特征重要性进行评价;基于该准则建立了基于前向贪婪搜索的特征选择算法。在UCI机器学习标准数据集上的多组实验,证明算法能够有效地选择出对分类类别有效的特征子集,能够明显提高分类精度。

关 键 词:模糊等价关系  联合互信息  最大最小准则  特征选择  

Feature selection using normalized fuzzy joint mutual information maximum
DONG Zemin,SHI Qiang. Feature selection using normalized fuzzy joint mutual information maximum[J]. Computer Engineering and Applications, 2017, 53(22): 105-110. DOI: 10.3778/j.issn.1002-8331.1605-0293
Authors:DONG Zemin  SHI Qiang
Affiliation:1.Research and Training Center of City College, Wuhan University of Science and Technology, Wuhan 430083, China2.School of Software Engineering, Huazhong University of Science & Technology, Wuhan 430000, China
Abstract:Feature selection is the method that selects feature subset that has strong relevancy between features and classification and smallest redundancy among features from feature set. This can improve the classifier’s computational efficiency, and enhance the classifier’s generalization, and therefore increase classification accuracy. However, the relevance and redundancy evaluation criteria based on mutual information has the following problems in the practical applications:(1) It is difficult to calculate the probability of a variable and the feature’s information entropy; (2) The approach based on mutual information tends to choose features which have more values; (3) The method measuring redundancy between candidate features and selected feature subset based on cumulative addition with higher dimension data sets always is invalid. To solve the above problems, the feature evaluation criteria based on Normalized Fuzzy Joint Mutual Information Maximum(NFJMIM) is proposed in this paper. Firstly, the entropy, conditional entropy, joint entropy of a variable are calculated based on fuzzy equivalence relation. Secondly, the feature’s importance is evaluated base on NFJMIM. Finally, using the established criteria, forward greedy search approach is used for searching feature subset. Several experiments using UCI machine learning repository prove that the proposed algorithm can effectively select effective feature subset, and can significantly improve the classification accuracy.
Keywords:fuzzy equivalence relations  joint mutual information  the maximum and minimum criteria  feature selection  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号