基于类别重要度的MIMLBoost改进算法 Improved MIMLBoost algorithm based on importance evaluation of labels期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于类别重要度的MIMLBoost改进算法

引用本文：	郝宁,夏士雄,牛强,赵志军.基于类别重要度的MIMLBoost改进算法[J].计算机应用,2015,35(11):3122-3125.

作者姓名：	郝宁夏士雄牛强赵志军

作者单位：	1. 中国矿业大学计算机科学与技术学院, 江苏徐州 221116;2. 舟山市定海区交通建设事务中心, 浙江舟山 316000

基金项目：	江苏省产学研联合创新资金前瞻性联合研究项目(BY2014028-09);国家海洋局数字海洋科学技术重点实验室开放基金资助项目(KLDO201304);浙江省交通运输厅科研计划项目(2014T25).

摘要：	针对多示例多标记学习算法MIMLBoost中退化过程造成的类别不平衡问题,运用人工降采样思想,引入类别重要度,提出一种改进的基于类别标记评估的退化方法.该方法通过对示例空间中的示例包进行聚类,把标记空间中的标记量化到聚类簇上,再以聚类簇为单位,利用TF-IDF算法对每个类别标记进行重要度评估和筛选,去除重要度低的标记,并将簇中的示例包与其余的类别标记拼接起来,以此来减少大类样本的出现,完成多示例多标记样本向多示例单标记样本的转化.在自然数据集上进行了实验,实验结果发现,改进算法的性能整体上优于原算法,尤其在Hamming loss、coverage、ranking loss三个评测指标上尤为明显,说明所提算法能够有效降低分类的出错率,提高算法的精度和分类效率.
关键词：	多示例多标记 MIMIBoost算法 TF-IDF算法聚类类别不平衡
收稿时间：	2015-06-17
修稿时间：	2015-07-09
Improved MIMLBoost algorithm based on importance evaluation of labels

HAO Ning,XIA Shixiong,NIU Qiang,ZHAO Zhijun.Improved MIMLBoost algorithm based on importance evaluation of labels[J].journal of Computer Applications,2015,35(11):3122-3125.

Authors:	HAO Ning XIA Shixiong NIU Qiang ZHAO Zhijun

Affiliation:	1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China;2. Ministry of Transport of Dinghai District, Zhoushan Zhejiang 316000, China

Abstract:	In order to solve the problem of class imbalance which the original degradation method causes in MIMLBoost algorithm, this paper introduced the importance of class into the original algorithm and an improved degradation method based on the category tag evaluating was proposed. First of all, the proposed method used a clustering algorithm to cluster all bags into groups. Each group could be treated as a concept in the multi-instance bag, and every class label could be quantified in each group. Then, the TF-IDF(Term Frequency-Inverse Document Frequency) algorithm was used to get the importance of each label in each group. Finally, for each group, the label whose importance was lowest in the group could be removed, because this label created many negative samples easily when the MIML (Multi-Instance Multi-Label) samples were transformed into multi-instance samples. The experimental results show that the new degradation method is effective, and the performance of improved algorithm is better than the original algorithm, especially in the terms of Hamming loss, coverage and ranking loss. This confirms that the new algorithm can reduce the error rate of classification and improve the precision of algorithm effectively.

Keywords:	Multi-Instance Multi-Label (MIML) MIMLBoost algorithm Term Frequency-Inverse Document Frequency (TF-IDF) algorithm clustering class imbalance

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏