首页 | 本学科首页   官方微博 | 高级检索  
     

简约语法规则和最大熵模型相结合的混合实体识别
引用本文:陆铭,康雨洁,俞能海. 简约语法规则和最大熵模型相结合的混合实体识别[J]. 小型微型计算机系统, 2012, 33(3): 537-541
作者姓名:陆铭  康雨洁  俞能海
作者单位:中国科学技术大学多媒体计算与通信教育部微软重点实验室,合肥,230027
基金项目:国家″八六三″高技术研究发展计划项目(2008AA01Z117)资助;国家自然科学基金重点项目(60933013)资助;国家重大科技专项项目(2010ZX03004-003)资助;博士学科点专项科研基金项目(20070358040)资助
摘    要:
现有的命名实体识别算法多半采用统计与规则相结合的办法,但是这些方法有的没有考虑全局信息,有的没有解决好统计模型的时间复杂性问题.提出一个简约语法规则和最大熵模型相结合的混合命名实体识别方法,该方法采用简约语法规则与最大熵模型级联,首先使用简约语法规则模型进行识别,降低了使用复杂语法规则的时间复杂度,并把它的输出进行部分匹配,很好的弥补了由于简约语法规则带来的召回率偏低的问题,然后将得到的中间结果作为输入传递给最大熵模型,再由最大熵模型进行识别,得到最终的识别结果.实验结果表明,在MUC-7的命名实体识别评测中,系统的准确率、召回率和F值分别达到了94%,91%和92.48%,与已有的系统相比在性能上有很大的提升.

关 键 词:命名实体识别  简约语法规则  最大熵模型  部分匹配

Basic Grammar Rule and Maximum Entropy Based Hybrid Model for Named Entity Recognition
LU Ming , KANG Yu-jie , YU Neng-hai. Basic Grammar Rule and Maximum Entropy Based Hybrid Model for Named Entity Recognition[J]. Mini-micro Systems, 2012, 33(3): 537-541
Authors:LU Ming    KANG Yu-jie    YU Neng-hai
Affiliation:(MOE-MS Key Lab of MCC University of Science and Technology of China,University of Science and Technology of China,Hefei 230027,China)
Abstract:
Recent years have witnessed the explosion of the World Wide Web.The massive unstructured data from WWW requires information extraction technology to relief the information overload.As a key technology of information extraction,named entity recognition attracts many research interests these years.Existing named entity recognition algorithms can be categorized into statistical methods,rule based methods and their combinations.However these methods either fail to consider the global information or lack of efficiency due to the statistical model.This paper proposed a hybrid named entity recognition system,which combines basic grammar rule model and maximum entropy model.The recognition system first recognize the named entity with basic grammar rules model,which consumes less time than complex grammar rule model,and then use partial matching to enhance the recall of basic grammar rule model.After partial matching,the result generated by grammar rules is refined with the maximum entropy model to get the final recognition result.Experiments on MUC-7 dataset show that our method can achieve 94% precision,91% recall and 92.48% F measure,which is a large improvement compared to existing systems.
Keywords:named entity recognition  basic grammar rule  maximum entropy model  partial matching
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号