首页 | 本学科首页   官方微博 | 高级检索  
     

多项式核支持向量机文本分类器泛化性能分析
引用本文:孙建涛,郭崇慧,陆玉昌,石纯一.多项式核支持向量机文本分类器泛化性能分析[J].计算机研究与发展,2004,41(8):1321-1326.
作者姓名:孙建涛  郭崇慧  陆玉昌  石纯一
作者单位:清华大学计算机科学与技术系,北京,100084;清华大学智能技术与系统国家重点实验室,北京,100084
基金项目:国家自然科学基金重大项目 ( 79990 5 80 ),国家“九七三”重点基础研究发展规划项目 (G19980 3 0 414 )
摘    要:VC维理论和结构风险最小化准则是统计学习理论中的重要内容,基于这一理论的支持向量机算法由于具有好的泛化性能受到重视,并被研究用于文本分类问题.基于多项式核的研究工作认为SVM的泛化能力不受多项式阶数的影响,并且能够处理很高维的分类问题,用于文本分类无需进行特征选择.研究发现,随着多项式核阶数的升高,SVM文本分类器会出现过学习现象,并且特征数越多越明显,特征选择是必需的.通过估计函数集的VC维,基于结构风险最小化理论对此问题进行分析,得出的结论跟实验结果相符.

关 键 词:支持向量机  文本分类  结构风险最小化

Estimating the Generalization Performance of Polynomial SVM Classifier for Text Categorization
SUN Jian-Tao,GUO Chong-Hui,LU Yu-Chang,and SHI Chun-Yi.Estimating the Generalization Performance of Polynomial SVM Classifier for Text Categorization[J].Journal of Computer Research and Development,2004,41(8):1321-1326.
Authors:SUN Jian-Tao  GUO Chong-Hui  LU Yu-Chang  and SHI Chun-Yi
Abstract:VC theory and structural risk minimization principle are key concepts of statistical learning theory. Developed from this theory, SVM is widly investigated and used for text categorization because of its high generalization performance. Previous work showed that polynomial SVM's performance was irrevelant of the order and it was appropriate for high dimensional text categorization problems without feature selection. The research indicates over-fitting problems occur as the polynomial order increases. SVM's generalization performance decreases drastically if too many features are used, so feature selection is necessary. Based on the structural risk minimization principle, this fact is analyzed via estimating functional classes's VC dimension. And the empirical results support the theoretical conclusions.
Keywords:support vector machine  text categorization  structural risk minimization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号