基于数据驱动的中文分词方法研究 Research on Chinese Word Segmentation Based on Data-Driven Approach期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于数据驱动的中文分词方法研究

引用本文：	李知兵,李龙澍.基于数据驱动的中文分词方法研究[J].现代计算机,2007(12):8-10,19.

作者姓名：	李知兵李龙澍

作者单位：	[1]安徽大学计算机学院,合肥230039 [2]安徽大学计算智能与信号处理重点实验室,合肥230039

摘要：	中文自动分词是计算机中文信息处理中的难题.介绍一种基于数据驱动的中文分词方法,开发了基于该方法的分词系统,此系统在北大<人民日报>标注语料库中进行封闭测试,取得较好的效果.系统包含了一个新词识别器、一个基本分词算法和实现单字构词、词缀构词以及一致性检验的程序.
关键词：	中文分词数据驱动新词识别组合歧义
收稿时间：	2007-09-27
修稿时间：	2007-11-21
Research on Chinese Word Segmentation Based on Data-Driven Approach

LI Zhi-bing,LI Long-shu.Research on Chinese Word Segmentation Based on Data-Driven Approach[J].Modem Computer,2007(12):8-10,19.

Authors:	LI Zhi-bing LI Long-shu

Abstract:	Chinese automatic segmentation is one of the most difficult problems in computer Chinese information disposal. Introduces a data-driven Chinese word segmentation, develops a word segmentation system based on this approach. Closed tests conducted on PKU-ICL-PD-Corpus perform well. It consists of a new words recognizer, a base segmentation algorithm, and procedures for combining single character, suffixes, and checking segmentation consistencies.

Keywords:	Chinese Word Segmentation Data-Driven New Words Recognition Combinational Ambiguity
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏