首页 | 本学科首页   官方微博 | 高级检索  
     

基于SVM的不良文本信息识别
引用本文:吕洪艳,杜鹃.基于SVM的不良文本信息识别[J].计算机系统应用,2015,24(6):183-187.
作者姓名:吕洪艳  杜鹃
作者单位:东北石油大学计算机与信息技术学院,大庆,163318
摘    要:不良文本识别的实际应用中,大多数文本之间总有交界甚至彼此掺杂,这种非线性不可分问题给不良文本识别带来了难度。应用 SVM 通过非线性变换可以使原空间转化为某个高维空间中的线性问题,而选择合适的核函数是 SVM 的关键。由于单核无法兼顾对独立的不良词汇和词汇组合的识别,使识别准确率不高,而且也无法兼顾召回率。针对不良文本识别的特定应用,依据 Mercer 定理结合线性核与多项式核提出了一种新的组合核函数,这种组合核函数能兼顾线性核与多项式核的优势,能够实现对独立的不良词汇以及词汇组合进行识别。在仿真实验中评估了线性核、齐次多项式核以及组合核函数,实验结果表明组合核函数的识别准确率与召回率都比较理想。

关 键 词:SVM  组合核函数  不良文本  信息识别  召回率
收稿时间:2014/10/12 0:00:00
修稿时间:2014/11/28 0:00:00

Undesirable Text Recognition Based on SVM
LV Hong-Yan and DU Juan.Undesirable Text Recognition Based on SVM[J].Computer Systems& Applications,2015,24(6):183-187.
Authors:LV Hong-Yan and DU Juan
Affiliation:Institute of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China;Institute of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China
Abstract:In practical application of undesirable text information identification, most of the text always have intersection even doped with each other. The nonlinear non-separable problem has brought difficulty to undesirable text information identification. SVM can make a nonlinear problem in the original space into a linear problem in high dimension space by nonlinear transformation. And the key of the SVM is to choose the appropriate kernel function. A single kernel function can not recognize the independent undesirable vocabulary and vocabulary combination at the same time, so the recognition accuracy rate is not high and the Rcall value is not ideal. For the specific application of undesirable text information identification, combining with linear kernel and homogeneous polynomial kernel it structured a new combination kernel function according to the Mercer theorem. This combination kernel function has the advantage of both linear kernel and polynomial kernel, and could identify the independent undesirable vocabulary and vocabulary combination. Then it evaluated the linear kernel, homogeneous polynomial kernel and combination kernel function in the sample experiment. The experimental results showed that the recognition accuracy rate and the Rcall value of combination kernel function was more ideal than other kernel functions.
Keywords:SVM  combination kernel function  undesirable text  information identification  recall
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号