首页 | 本学科首页   官方微博 | 高级检索  
     

基于类别重要度的MIMLBoost改进算法
引用本文:郝宁,夏士雄,牛强,赵志军.基于类别重要度的MIMLBoost改进算法[J].计算机应用,2015,35(11):3122-3125.
作者姓名:郝宁  夏士雄  牛强  赵志军
作者单位:1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116;2. 舟山市定海区交通建设事务中心, 浙江 舟山 316000
基金项目:江苏省产学研联合创新资金前瞻性联合研究项目(BY2014028-09);国家海洋局数字海洋科学技术重点实验室开放基金资助项目(KLDO201304);浙江省交通运输厅科研计划项目(2014T25).
摘    要:针对多示例多标记学习算法MIMLBoost中退化过程造成的类别不平衡问题,运用人工降采样思想,引入类别重要度,提出一种改进的基于类别标记评估的退化方法.该方法通过对示例空间中的示例包进行聚类,把标记空间中的标记量化到聚类簇上,再以聚类簇为单位,利用TF-IDF算法对每个类别标记进行重要度评估和筛选,去除重要度低的标记,并将簇中的示例包与其余的类别标记拼接起来,以此来减少大类样本的出现,完成多示例多标记样本向多示例单标记样本的转化.在自然数据集上进行了实验,实验结果发现,改进算法的性能整体上优于原算法,尤其在Hamming loss、coverage、ranking loss三个评测指标上尤为明显,说明所提算法能够有效降低分类的出错率,提高算法的精度和分类效率.

关 键 词:多示例多标记  MIMIBoost算法  TF-IDF算法  聚类  类别不平衡  
收稿时间:2015-06-17
修稿时间:2015-07-09

Improved MIMLBoost algorithm based on importance evaluation of labels
HAO Ning,XIA Shixiong,NIU Qiang,ZHAO Zhijun.Improved MIMLBoost algorithm based on importance evaluation of labels[J].journal of Computer Applications,2015,35(11):3122-3125.
Authors:HAO Ning  XIA Shixiong  NIU Qiang  ZHAO Zhijun
Affiliation:1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China;2. Ministry of Transport of Dinghai District, Zhoushan Zhejiang 316000, China
Abstract:In order to solve the problem of class imbalance which the original degradation method causes in MIMLBoost algorithm, this paper introduced the importance of class into the original algorithm and an improved degradation method based on the category tag evaluating was proposed. First of all, the proposed method used a clustering algorithm to cluster all bags into groups. Each group could be treated as a concept in the multi-instance bag, and every class label could be quantified in each group. Then, the TF-IDF(Term Frequency-Inverse Document Frequency) algorithm was used to get the importance of each label in each group. Finally, for each group, the label whose importance was lowest in the group could be removed, because this label created many negative samples easily when the MIML (Multi-Instance Multi-Label) samples were transformed into multi-instance samples. The experimental results show that the new degradation method is effective, and the performance of improved algorithm is better than the original algorithm, especially in the terms of Hamming loss, coverage and ranking loss. This confirms that the new algorithm can reduce the error rate of classification and improve the precision of algorithm effectively.
Keywords:Multi-Instance Multi-Label (MIML)  MIMLBoost algorithm  Term Frequency-Inverse Document Frequency (TF-IDF) algorithm  clustering  class imbalance  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号