基于标签关系改进的多标签特征选择算法 Multi-label Feature Selection Algorithm Based on Improved Label Correlation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于标签关系改进的多标签特征选择算法

引用本文：	陈福才,李思豪,张建朋,黄瑞阳. 基于标签关系改进的多标签特征选择算法[J]. 计算机科学, 2018, 45(6): 228-234

作者姓名：	陈福才李思豪张建朋黄瑞阳

作者单位：	国家数字交换系统工程技术研究中心郑州450002,国家数字交换系统工程技术研究中心郑州450002,国家数字交换系统工程技术研究中心郑州450002,国家数字交换系统工程技术研究中心郑州450002

基金项目：	本文受国家重点研发计划项目(2016YFB0800101),国家自然科学基金创新研究群体项目(61521003)资助

摘要：	多标签特征选择是应对数据维度灾难现象的主要方法之一,可以在降低特征维度的同时提高学习效率,优化分类性能。针对目前特征选择算法没有考虑标签间的相互关系,以及信息量的衡量范围存在偏差的问题,提出一种基于标签关系改进的多标签特征选择算法。首先引入对称不确定性对信息量进行归一化处理,然后用归一化的互信息量作为相关性的衡量方法,并据此定义标签的重要性权重,对依赖度和冗余度中的标签相关项进行加权处理；进而提出一种特征评分函数作为特征重要性的评价指标,并依次选择出评分最高的特征组成最佳特征子集。实验结果表明,与其他算法相比,该算法在提取出更加精确的低维特征子集后,不仅能够有效提高面向实体信息挖掘的多标签学习算法的性能,也能提高基于离散特征的多标签学习算法的效率。
关键词：	多标签特征选择标签关系依赖度冗余度特征评分
收稿时间：	2017-04-25
修稿时间：	2017-07-29
Multi-label Feature Selection Algorithm Based on Improved Label Correlation

CHEN Fu-cai,LI Si-hao,ZHANG Jian-peng and HUANG Rui-yang. Multi-label Feature Selection Algorithm Based on Improved Label Correlation[J]. Computer Science, 2018, 45(6): 228-234

Authors:	CHEN Fu-cai LI Si-hao ZHANG Jian-peng HUANG Rui-yang

Affiliation:	National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China,National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China,National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China and National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China

Abstract:	Multi-label feature selection is one of the essential methods to overcome the curse of dimensionality.It reduces the feature dimension,improves the learning efficiency,and optimizes the classification performance.However,many existing feature selection algorithms hardly take label correlation into consideration,and the range of information entropies are biased within different data sets.To address those problems,this paper proposed a multi-label feature selection algorithm based on the improved label correlation.The algorithm firstly uses symmetrical uncertainty to norma-lize the information entropy,and takes normalized mutual information as relationship measurement to define the label importance,with which the label-related items in dependency and redundancy are weighted.In the end,the score function is put forward to evaluate the feature importance,and the best feature subset is selected with the highest score.Experiments demonstrate that after selecting out the concise and accurate feature subset,the multi-label classification is accelerated in terms of the performance and the efficiency with disperse features.

Keywords:	Multi-label feature selection Label correlation Dependency Redundancy Feature score

	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏