首页 | 本学科首页   官方微博 | 高级检索  
     

C值和互信息相结合的术语抽取
引用本文:梁颖红,张文静,张有承.C值和互信息相结合的术语抽取[J].计算机应用与软件,2010,27(4):108-110.
作者姓名:梁颖红  张文静  张有承
作者单位:1. 苏州市职业大学江苏省现代企业信息化应用支撑软件工程技术研究开发中心,江苏,苏州,215104
2. 东北林业大学信息与计算机工程学院,黑龙江,哈尔滨,150040
基金项目:江苏省现代企业信息化应用支撑软件工程技术研究开发项目(SX200907);;黑龙江省博士后基金(520415029);;江苏省“青蓝”工程(2008)
摘    要:在目前的生物信息领域开放语料的术语抽取实验中,前2000多个双字词的精度已经达到了90.36%,但是三字以上的词的抽取精度只有66.63%,多字词的抽取成为了名词术语自动抽取的一个难点问题。针对该难点,提出综合C-value参数在长术语抽取方面的优势,并与术语抽取中的互信息参数相结合的策略来识别术语。实验结果表明,长术语抽取正确率为75.7%,召回率为68.4%,F测量值为71.9%,高于相同语料下的其他方法。

关 键 词:术语抽取  C值  互信息  

TERM RECOGNITION BASED ON INTEGRATION OF C-VALUE AND MUTUAL INFORMATION
Liang Yinghong,Zhang Wenjing,Zhang Youcheng.TERM RECOGNITION BASED ON INTEGRATION OF C-VALUE AND MUTUAL INFORMATION[J].Computer Applications and Software,2010,27(4):108-110.
Authors:Liang Yinghong  Zhang Wenjing  Zhang Youcheng
Affiliation:Jiangsu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise/a>;Suzhou Vocational University/a>;Suzhou 215104/a>;Jiangsu/a>;China; School of Information and Computer Engineering/a>;North East Forestry University/a>;Harbin 150040/a>;Heilongjiang/a>;China
Abstract:In current experimental results of term recognition on biology information open corpus,more than 2000 anterior Chinese phrases composed of two characters has reached the precision of 90.36%,but the recognition precision of Chinese phrases composed of three or more characters is only 66.63% .So the recognition of Chinese phrases with multiple characters becomes a difficulty in automatic recognition of noun terminologies.To resolve this,a strategy of term recognition for biology information is proposed in thi...
Keywords:Term recognition C-value Mutual information  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号