首页 | 本学科首页   官方微博 | 高级检索  
     

基于向量余弦的支持向量机主动学习策略
引用本文:郭虎升,王文剑,白龙飞. 基于向量余弦的支持向量机主动学习策略[J]. 计算机科学与探索, 2014, 0(7): 868-876
作者姓名:郭虎升  王文剑  白龙飞
作者单位:[1]山西大学计算机与信息技术学院,太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,太原030006
基金项目:The National Natural Science Foundation of China under Grant Nos.61273291,60975035(国家自然科学基金);the Research Project Supported by Scholarship Council of Shanxi Province under Grant No.2012-008(山西省回国留学人员科研资助项目);the Graduate Innovation Project of Shanxi Province under Grant No.20133001(山西省优秀研究生创新项目).
摘    要:针对传统基于主动学习的支持向量机(support vector machine,SVM)方法中所采用的欧式距离不能有效衡量高维样本之间的相关程度,导致学习器泛化能力下降的问题,提出了一种基于向量余弦的支持向量机主动学习(SVM active learning based on vector cosine)策略,称为COS_SVMactive方法。该方法通过在主动学习过程中引入向量余弦来度量训练集中样本信息的冗余度,以挑选那些含有重要分类信息的最有价值样本交给专家进行人工标注,并在迭代的样本标注过程中对训练集的平衡度进行逐步调整,使学习器获得更好的泛化性能。实验结果表明,与传统基于随机采样的SVM主动学习方法(SVM active learning based on ran-dom sampling,RS_SVMactive)和基于距离的SVM主动学习方法(SVM active learning based on distance, DIS_SVMactive)相比,COS_SVMactive方法不仅可以提高分类精度,而且能够减少专家标记代价。

关 键 词:支持向量机  主动学习  向量余弦  冗余度  平衡度

Support Vector Machine Active Learning Strategy Based on Vector Cosine
GUO Husheng,WANG Wenjian,BAI Longfei. Support Vector Machine Active Learning Strategy Based on Vector Cosine[J]. Journal of Frontier of Computer Science and Technology, 2014, 0(7): 868-876
Authors:GUO Husheng  WANG Wenjian  BAI Longfei
Affiliation:1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China 2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Shanxi University, Taiyuan 030006, China)
Abstract:This paper proposes a support vector machine (SVM) active learning strategy based on vector cosine for the high dimensional dataset to solve the problem that the traditional support vector machine based on active learning can not measure the correlation degree of different samples by Euclidean distance and obtains the low generalization ability, namely COS_SVMactive method. By measuring the information redundancy of training samples based on vector cosine on active learning procedure, several the most valuable samples are selected and need be labeled by experts. In each samples labeling loop, the balance of labeled data is gradually adjusted in order to achieve good generalization performance. The experimental results demonstrate that, compared with common SVM active learning based on random sampling (RS_SVMactive) and SVM active learning based on distance (DIS_SVMactive) methods, the proposed COS_SVMactive method can not only improve classification accuracy, but also reduce the artificial labeling cost.
Keywords:support vector machine  active learning  vector cosine  redundancy  balance
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号