基于SVM的词频统计中文分词研究 Study on Chinese word segmentation based on statistic and SVM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于SVM的词频统计中文分词研究

引用本文：	朱小娟,陈特放.基于SVM的词频统计中文分词研究[J].微计算机信息,2007,23(30):205-207.

作者姓名：	朱小娟陈特放

作者单位：	中南大学信息科学与工程学院,长沙,410075

摘要：	本文详细介绍SVM（支持向量机）在词频统计中文分词中的应用。可将输入的连续字串进行分词处理，输出分割后的汉语词串，一般为二字词串，并得到一个词典。词典中不重复地存储了每次处理中得到的词语，以及这些词语出现的频率。选用了互信息原理进行统计。并采用SVM算法，分词的准确性与传统相比有了很大的提高，并具有一定的稳定性。
关键词：	中文分词词频统计互信息支持向量机
文章编号：	1008-0570（2007）10-3-0205-03
修稿时间：	2007-07-03
Study on Chinese word segmentation based on statistic and SVM

ZHU XIAOJUAN,CHEN TEFANG.Study on Chinese word segmentation based on statistic and SVM[J].Control & Automation,2007,23(30):205-207.

Authors:	ZHU XIAOJUAN CHEN TEFANG

Abstract:	The paper introduces the application of SVM in Chinese word segmentation, which is based on statistic the frequency of the word. Through the system, continuous character bunch input can be segmented, and then the cut apart word bunch output can be got ten, the cut apart word bunch usually is two character word bunch, and one dictionary can be gotten. The dictionary stores word and the frequency that the word appears in these disposal tests. The segmentation system selects Mutual Information to statistic. Use SVMt the veracity of segmentation was better than the traditional method, and is of high stability.

Keywords:	Chinese word segmentation Statistic the frequency of the word Mutual Information SVM
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏