首页 | 官方网站   微博 | 高级检索  
     

基于统计和规则相结合的科技术语自动抽取研究
引用本文:刘豹,张桂平,蔡东风.基于统计和规则相结合的科技术语自动抽取研究[J].计算机工程与应用,2008,44(23):147-150.
作者姓名:刘豹  张桂平  蔡东风
作者单位:沈阳航空工业学院,知识工程中心,沈阳,110034
基金项目:国家高技术研究发展计划(863) , 教育部科学技术研究重点项目
摘    要:科技术语自动抽取是中文信息处理领域的一个重要研究课题,在信息检索、机器翻译等领域,特别是在专利翻译中有着广泛应用。结合专利翻译任务,主要研究专利中科技术语的识别方法,在分析目前已有方法的基础之上,提出了一种使用条件随机场模型进行标注识别,并结合规则对错误识别结果进行后处理的科技术语识别方法。实验结果表明,提出的统计和规则相结合的识别方法是有效的,开放测试结果F值达到了84.4%。

关 键 词:条件随机场  科技术语抽取  术语识别
收稿时间:2007-10-18
修稿时间:2008-1-18  

Technical term automatic extraction research based on statistics and rules
LIU Bao,ZHANG Gui-ping,CAI Dong-feng.Technical term automatic extraction research based on statistics and rules[J].Computer Engineering and Applications,2008,44(23):147-150.
Authors:LIU Bao  ZHANG Gui-ping  CAI Dong-feng
Affiliation:Knowledge Engineering Center,Shenyang Institute of Aeronautical Engineering,Shenyang 110034,China
Abstract:Technical term automatic extraction is one of the important topics in Chinese information processing.It has been widely applied to information retrieval,machine translation,especially in the patent machine translation.In this paper,the research mainly focuses on the recognizing method of the technical term combined the patent machine translation task,proposes a technical term recognition method based on the statistics and rules at the base of the analysis of existed method.It first uses Conditional Random Fields(CRF) model to label and recognize the corpus,then a post-processing step based on rules is used to correct the wrong labeled result.The experiment results show the method is efficient for identifying technical terms,in open test the F-value reaches 84.4%.
Keywords:Conditional Random Fields(CRF)  technical term extraction  term recognition
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号