首页 | 本学科首页   官方微博 | 高级检索  
     

基于SVM的词频统计中文分词研究
引用本文:朱小娟,陈特放.基于SVM的词频统计中文分词研究[J].微计算机信息,2007,23(30):205-207.
作者姓名:朱小娟  陈特放
作者单位:中南大学信息科学与工程学院,长沙,410075
摘    要:本文详细介绍SVM(支持向量机)在词频统计中文分词中的应用。可将输入的连续字串进行分词处理,输出分割后的汉语词串,一般为二字词串,并得到一个词典。词典中不重复地存储了每次处理中得到的词语,以及这些词语出现的频率。选用了互信息原理进行统计。并采用SVM算法,分词的准确性与传统相比有了很大的提高,并具有一定的稳定性。

关 键 词:中文分词  词频统计  互信息  支持向量机
文章编号:1008-0570(2007)10-3-0205-03
修稿时间:2007-07-03

Study on Chinese word segmentation based on statistic and SVM
ZHU XIAOJUAN,CHEN TEFANG.Study on Chinese word segmentation based on statistic and SVM[J].Control & Automation,2007,23(30):205-207.
Authors:ZHU XIAOJUAN  CHEN TEFANG
Abstract:The paper introduces the application of SVM in Chinese word segmentation, which is based on statistic the frequency of the word. Through the system, continuous character bunch input can be segmented, and then the cut apart word bunch output can be got ten, the cut apart word bunch usually is two character word bunch, and one dictionary can be gotten. The dictionary stores word and the frequency that the word appears in these disposal tests. The segmentation system selects Mutual Information to statistic. Use SVMt the veracity of segmentation was better than the traditional method, and is of high stability.
Keywords:Chinese word segmentation  Statistic the frequency of the word  Mutual Information  SVM
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号