首页 | 本学科首页   官方微博 | 高级检索  
     

面向商务信息抽取的产品命名实体识别研究
引用本文:刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):9-15.
作者姓名:刘非凡  赵军  吕碧波  徐波  于浩  夏迎炬
作者单位:1.中国科学院自动化研究所模式识别国家重点实验室2.富士通研究开发中心有限公司
基金项目:国家自然科学基金资助项目(60372016),北京市自然科学基金资助项目(4052027)
摘    要:市场信息化使得商务信息抽取、市场内容管理日益成为信息科学领域的一个研究热点。产品命名实体识别作为其中非常重要的关键技术之一也逐渐受到人们的关注。本文面向商务信息抽取对产品命名实体进行了定义并系统分析了其识别任务的特点和难点,提出了一种基于层级隐马尔可夫模型(hierarchical hidden Markov model)的产品命名实体识别方法,实现了汉语自由文本中产品命名实体识别和标注的原型系统。实验表明,该系统在电子数码和手机领域均取得了令人满意的实验结果,对产品名实体、产品型号实体、产品品牌实体整体识别性能的F值分别为79.7% ,86.9% ,75.8%。通过和最大熵模型相比较,验证了HHMM对于处理多尺度嵌套序列有更强的表征能力。

关 键 词:计算机应用  中文信息处理  产品命名实体识别  商务信息抽取  层级隐马尔可夫模型  
文章编号:1003-0077(2006)01-0007-07
收稿时间:2005-05-03
修稿时间:2005-05-032005-11-03

Study on Product Named Entity Recognition for Business Information Extraction
LIU Fei-fan,ZHAO Jun,LV Bi-bo,XU Bo,YU Hao,XIA Ying-ju.Study on Product Named Entity Recognition for Business Information Extraction[J].Journal of Chinese Information Processing,2006,20(1):9-15.
Authors:LIU Fei-fan  ZHAO Jun  LV Bi-bo  XU Bo  YU Hao  XIA Ying-ju
Affiliation:1.National Laboratory of Pattern Recognition , Institute of Automation , Chinese Academy of Sciences2.FUJ ITSU R&D
Abstract:Electronic business has fueled increasing research interest recently in business information extraction and market intelligence management.As one of the key techniques,product named entity recognition(product NER) has also begun to draw more attention in the field of natural language processing.In the paper,characteristics and challenges in product NER are explored and analyzed deliberately,and a hierarchical hidden Markov model(HHMM) based approach to product NER from Chinese free text is presented.Experimental results in both digital and mobile phone domains show that our approach performs quite well in these two different domains and achieves F-measures of 79.7%,86.9%,75.8% on the whole for three types of product named entities respectively.In comparison with maximum entropy model,HHMM is experimentally proved to be more powerful for dealing with multi-scale embedded sequence problem.
Keywords:computer application  Chinese information processing  product named entity recognition  business information extraction  hierarchical hidden Markov model(HHMM)  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号