首页 | 本学科首页   官方微博 | 高级检索  
     

汉语分词中组合歧义字段的研究
引用本文:秦颖,王小捷,张素香. 汉语分词中组合歧义字段的研究[J]. 中文信息学报, 2007, 21(1): 1-8
作者姓名:秦颖  王小捷  张素香
作者单位:1. 北京邮电大学 信息工程学院,北京100876;
2. 华北电力大学 电子与通信工程系,河北 保定071003
基金项目:教育部语言文字信息管理司资助项目
摘    要:汉语自动分词中组合歧义是难点问题,难在两点: 组合歧义字段的发现和歧义的消解。本文研究了组合歧义字段在切开与不切时的词性变化规律,提出了一种新的组合歧义字段自动采集方法,实验结果表明该方法可以有效地自动发现组合歧义字段,在1998年1月《人民日报》中就检测到400多个组合歧义字段,远大于常规方法检测到的歧义字段数目。之后利用最大熵模型对60个组合歧义字段进行消歧,考察了六种特征及其组合对消歧性能的影响,消歧的平均准确度达88.05%。

关 键 词:计算机应用  中文信息处理  汉语切分  组合歧义  最大熵  特征  
文章编号:1003-0077(2007)01-0003-06
收稿时间:2006-01-18
修稿时间:2006-09-04

Research on Combinational Ambiguity in Chinese Word Segmentation
QIN Ying,WANG Xiao-jie,ZHANG Su-xiang. Research on Combinational Ambiguity in Chinese Word Segmentation[J]. Journal of Chinese Information Processing, 2007, 21(1): 1-8
Authors:QIN Ying  WANG Xiao-jie  ZHANG Su-xiang
Affiliation:1. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Department of Electronic and Communication Engineering of North China
Electric Power University, Baoding, Hebei 071003, China
Abstract:One of challenges in Chinese Word Segmentation is the combinational ambiguity problem with two main obstacles: the detection of combinational ambiguities and ambiguity resolution.This paper investigate the structures of combinational ambiguities and proposes a new approach for automatically detecting this type of ambiguities.The experimental result reveals the approach is effective in the tagged corpus of 1998-01 People Daily with about 1 million words,we have detected more than 400 combinational ambiguities,far more than that detected by common approaches.Then the resolutions of 60 combinational ambiguities are carried out using the maximum entropy model.The effect of six kinds of features,as well as their combination,on the performance of disambiguation is further studies.The average accuracy of disambiguation reaches 88.05%.
Keywords:computer application  Chinese information processing  Chinese word segmentation  combinational ambiguity  maximum entropy  feature selection
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号