首页 | 官方网站   微博 | 高级检索  
     

基于混合模型的交集型歧义消歧策略
引用本文:李天侠,戴新宇,陈家骏.基于混合模型的交集型歧义消歧策略[J].计算机工程与应用,2008,44(21):5-8.
作者姓名:李天侠  戴新宇  陈家骏
作者单位:1. 南京大学计算机软件新技术国家重点实验室,南京,210093
2. 南京大学,计算机科学与技术系,南京,210093
基金项目:国家自然科学基金 , 国家社会科学基金 , 国家高技术研究发展计划(863计划)
摘    要:针对交集型歧义这一汉语分词中的难点问题,提出了一种规则和统计相结合的交集型歧义消歧模型。首先,根据标注语料库,通过基于错误驱动的学习思想,获取交集型歧义消歧规则库,同时,利用统计工具,构建N-Gram统计语言模型;然后,采用正向/逆向最大匹配方法和消歧规则库探测发现交集型歧义字段;最后,通过消歧规则库和评分函数进行交集型歧义的消歧处理。这种基于混合模型的方法可以探测到更多的交集型歧义字段,并且结合了规则方法和统计方法在处理交集型歧义上的优势。实验表明,这种方法提高了交集型歧义处理的精度,为解决交集型歧义提供了一种新的思路。

关 键 词:交集型歧义  消歧规则  统计语言模型  评分函数  全切分
收稿时间:2008-4-30
修稿时间:2008-6-2  

Hybrid model for overlapping ambiguities resolution
LI Tian-xia,DAI Xin-yu,CHEN Jia-jun.Hybrid model for overlapping ambiguities resolution[J].Computer Engineering and Applications,2008,44(21):5-8.
Authors:LI Tian-xia  DAI Xin-yu  CHEN Jia-jun
Affiliation:National Laboratory of Novel Software Technology,Nanjing University,Nanjing 210093,China Department of Computer Science and Technology,Nanjing University,Nanjing 210093,China
Abstract:Overlapping ambiguity is one of the key problems in Chinese words segmentation.In this paper,a new hybrid strategy which integrates rule-based method and statistical-based method is presented for solving the overlapping ambiguity.Firstly,rule-set is constructed automatically through error-driven learning which will be used for some ambiguities detection and resolution.Secondly,a score function based on N-Gram language model is constructed.Lastly,a rule-based module and a statistical-based module will be combined for solving all ambiguities detected by FMM&;BMM and the rule-set.The experiments show that this hybrid method is more suitable for ambiguities detection and possesses the advantages of both rule-based and statistical-based methods for overlapping ambiguities resolution in Chinese words segmentation.
Keywords:overlapping ambiguity  disambiguation rules  statistical language model  score function  full segmentation
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号