Web中文文本分词技术研究 Research of Chinese Word Segmentation Based on the Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Web中文文本分词技术研究

引用本文：	马玉春,宋瀚涛.Web中文文本分词技术研究[J].计算机应用,2004,24(4):134-135,155.

作者姓名：	马玉春宋瀚涛

作者单位：	北京理工大学,计算机系,北京,100081

摘要：	中文自动分词技术是中文Web信息处理的基础。文中以最大匹配法(MM)为基础，充分考虑上下文(MMC)，在内存中采用二分法进行分词匹配，有效地提高了分词的准确率和时效。并给出了评析报告，以及在生成Web文档中的关键词与自动摘要中的应用。
关键词：	中文分词匹配上下文信息熵
文章编号：	1001-9081(2004)04-0134-02
Research of Chinese Word Segmentation Based on the Web

MA Yu-chun,SONG Han-tao.Research of Chinese Word Segmentation Based on the Web[J].journal of Computer Applications,2004,24(4):134-135,155.

Authors:	MA Yu-chun SONG Han-tao

Abstract:	Chinese word automatic segmentation is the basis of Chinese information process. This paper is based on The Maximum Matching Method (MM),and takes the context into account (MMC). Chinese word is matched to dictionary in memory by Binary Search. This is very efficient in precision and saving time. Finally,there is a test report and an example based on MMC.

Keywords:	Chinese word segmentation match context entropy
本文献已被 CNKI 维普万方数据等数据库收录！