首页 | 本学科首页   官方微博 | 高级检索  
     

基于语境信息的汉语组合型歧义消歧方法
引用本文:冯素琴,陈惠明.基于语境信息的汉语组合型歧义消歧方法[J].中文信息学报,2007,21(6):13-16.
作者姓名:冯素琴  陈惠明
作者单位:忻州师范学院 计算机科学与技术系,山西 忻州 034000
基金项目:山西省忻州师范学院基金
摘    要:组合型歧义切分字段一直是汉语自动分词的难点,难点在于消歧依赖其上下文语境信息。本文采集、统计了组合型歧义字段的前后语境信息,应用对数似然比建立了语境计算模型,并考虑了语境信息的窗口大小、位置和频次对消歧的影响而设计了权值计算公式。在此基础上,1.使用语境信息中对数似然比的最大值进行消歧;2.使用语境信息中合、分两种情况下各自的对数似然比之和,取值大者进行消歧。对高频出现的14个组合型分词歧义进行实验,前者的平均准确率为84.93%,后者的平均准确率为95.60%。实验证明使用语境信息之和对消解组合型分词歧义具有良好效果。

关 键 词:计算机应用  中文信息处理  自然语言处理  汉语自动分词  组合型切分歧义  对数似然比  语境信息  
文章编号:1003-0077(2007)06-0013-04
收稿时间:2005-11-15
修稿时间:2005-11-152007-07-13

Context-based Approach to Combinational Ambiguity Resolution in Chinese Word Segmentation
FENG Su-qin,CHEN Hui-ming.Context-based Approach to Combinational Ambiguity Resolution in Chinese Word Segmentation[J].Journal of Chinese Information Processing,2007,21(6):13-16.
Authors:FENG Su-qin  CHEN Hui-ming
Affiliation:Dept. of Computer Science and Technology of Xinzhou Teachers’ College, Xinzhou, Shanxi 034000, China
Abstract:Combinational ambiguity is a challenging issue in Chinese word segmentation in that its disambiguation depends on the contextual information.This paper collected contextual information statistics of combinational ambiguity words and establishes a context model using log likelihood ratio.A weight calculation formula is designed considering contextual information's window size,location and the frequency.Based on this,two methods are investigated for disambiguation.One uses the maximum log likelihood ratio in contextual information;the other uses the maximum sum of log likelihood ratio between the situation of combination and separation in contextual information.Tested on 14 high-frequence ambiguous words,the average accuracy of the former method reaches 84.93%,and that of the latter reaches 95.60 %.The result of the experiment reveals that using the combination of contextual information is effective for disambiguation.
Keywords:computer application  Chinese information processing  natural language processing  Chinese word segmentation  combinational ambiguity  log likelihood ratio  contextual information
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号