首页 | 本学科首页   官方微博 | 高级检索  
     

基于标签关系改进的多标签特征选择算法
引用本文:陈福才,李思豪,张建朋,黄瑞阳. 基于标签关系改进的多标签特征选择算法[J]. 计算机科学, 2018, 45(6): 228-234
作者姓名:陈福才  李思豪  张建朋  黄瑞阳
作者单位:国家数字交换系统工程技术研究中心 郑州450002,国家数字交换系统工程技术研究中心 郑州450002,国家数字交换系统工程技术研究中心 郑州450002,国家数字交换系统工程技术研究中心 郑州450002
基金项目:本文受国家重点研发计划项目(2016YFB0800101),国家自然科学基金创新研究群体项目(61521003)资助
摘    要:多标签特征选择是应对数据维度灾难现象的主要方法之一,可以在降低特征维度的同时提高学习效率,优化分类性能。针对目前特征选择算法没有考虑标签间的相互关系,以及信息量的衡量范围存在偏差的问题,提出一种基于标签关系改进的多标签特征选择算法。首先引入对称不确定性对信息量进行归一化处理,然后用归一化的互信息量作为相关性的衡量方法,并据此定义标签的重要性权重,对依赖度和冗余度中的标签相关项进行加权处理;进而提出一种特征评分函数作为特征重要性的评价指标,并依次选择出评分最高的特征组成最佳特征子集。实验结果表明,与其他算法相比,该算法在提取出更加精确的低维特征子集后,不仅能够有效提高面向实体信息挖掘的多标签学习算法的性能,也能提高基于离散特征的多标签学习算法的效率。

关 键 词:多标签特征选择  标签关系  依赖度  冗余度  特征评分
收稿时间:2017-04-25
修稿时间:2017-07-29

Multi-label Feature Selection Algorithm Based on Improved Label Correlation
CHEN Fu-cai,LI Si-hao,ZHANG Jian-peng and HUANG Rui-yang. Multi-label Feature Selection Algorithm Based on Improved Label Correlation[J]. Computer Science, 2018, 45(6): 228-234
Authors:CHEN Fu-cai  LI Si-hao  ZHANG Jian-peng  HUANG Rui-yang
Affiliation:National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China,National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China,National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China and National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China
Abstract:Multi-label feature selection is one of the essential methods to overcome the curse of dimensionality.It reduces the feature dimension,improves the learning efficiency,and optimizes the classification performance.However,many existing feature selection algorithms hardly take label correlation into consideration,and the range of information entropies are biased within different data sets.To address those problems,this paper proposed a multi-label feature selection algorithm based on the improved label correlation.The algorithm firstly uses symmetrical uncertainty to norma-lize the information entropy,and takes normalized mutual information as relationship measurement to define the label importance,with which the label-related items in dependency and redundancy are weighted.In the end,the score function is put forward to evaluate the feature importance,and the best feature subset is selected with the highest score.Experiments demonstrate that after selecting out the concise and accurate feature subset,the multi-label classification is accelerated in terms of the performance and the efficiency with disperse features.
Keywords:Multi-label feature selection  Label correlation  Dependency  Redundancy  Feature score
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号