首页 | 本学科首页   官方微博 | 高级检索  
     

基于支持向量机和约束条件的新词识别研究
引用本文:徐远方,李成城.基于支持向量机和约束条件的新词识别研究[J].微机发展,2014(1):98-101.
作者姓名:徐远方  李成城
作者单位:内蒙古师范大学网络技术学院,内蒙古呼和浩特010022
基金项目:国家自然科学基金(2002AA117010-07);内蒙古自治区教育科研基金(GCRC09001);内蒙古师范大学校基金项目(2012ZRYB007)
摘    要:中文分词的关键技术之一在于如何正确切分新词,文中提出了一种新的识别新词的方法。借助支持向量机良好的分类性,首先对借助分词词典进行分词和词性标注过的训练语料中抽取正负样本,然后结合从训练语料中计算出的各种词本身特征进行向量化,通过支持向量机的训练得到新词分类支持向量。对含有模拟新词的测试语料进行分词和词性标注,结合提出的相关约束条件和松弛变量选取候选新词,通过与词本身特征结合进行向量化后作为输入与通过训练得到的支持向量机分类器进行计算,得到的相关结果与阈值进行比较,当结果小于阈值时判定为一个新词,而计算结果大于阈值的词为非新词。通过实验结果比较选取最合适的支持向量机核函数。

关 键 词:新词识别  支持向量机  约束条件  核函数

Research on New Word Identification Based on Support Vector Machine and Constraint Condition
XU Yuan-fang,LI Cheng-cheng.Research on New Word Identification Based on Support Vector Machine and Constraint Condition[J].Microcomputer Development,2014(1):98-101.
Authors:XU Yuan-fang  LI Cheng-cheng
Affiliation:(College of Network Technology, Inner Mongolia Normal University, Hohhot 010022, China)
Abstract:One of the key technologies of Chinese word segmentation is how to segment the new words correctly,present a new method a- bout the study of identification for new words. With the support of good classification of SVM, first extract the positive and negative sam- pies from training corpus which was handled by segmentation and POS tagging according to the dictionary ,then combining with all kinds of words' classification which was gotten from training corpus,gain the new word support vector through the training of supporting vec- tor machine. Word segmentation and POS tagging on the test of corpus containing simulated new words,in conjunction with the relevant constraints and the slack variables are proposed to select candidate new words, as to the quantized input and support vector machine classi- fier calculate by combining with the word itself characteristics, getting the relevant results is compared with a threshold, when the result is less than the threshold determine it a new word, and when the calculation results are greater than the threshold determine it non-new word. Through the comparison of experimental results is to select the most suitable kernel function of support vector machine.
Keywords:new word identification  SVM  constraint conditions ~ kemet function
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号