首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于独立性统计的子串归并算法
引用本文:周浪,冯冲,黄河燕,王平尧. 一种基于独立性统计的子串归并算法[J]. 计算机工程与应用, 2010, 46(24): 129-131. DOI: 10.3778/j.issn.1002-8331.2010.24.039
作者姓名:周浪  冯冲  黄河燕  王平尧
作者单位:1.南京理工大学 计算机科学与技术学院,南京 210094 2.中国科学院 计算机语言信息工程研究中心,北京 100097 3.宁波职业技术学院 计算机系,浙江 宁波 315800
基金项目:国家高技术研究发展计划(863),国家自然科学基金,宁波科技局重点科技项目 
摘    要:现行的子串归并算法都是采用一对一的方式针对同频子串提出的。但是在使用词法分析工具对文本进行切分时,不可避免地会产生很多的分词碎片,这直接导致了很多无意义子串的产生。通过分析这些无意义子串和众多父串之间的这种一对多关系,提出了一种基于独立性统计的子串归并算法。最后将该子串归并算法应用在中文术语抽取系统中,使得系统的准确率从91.3%提升到了93.32%。

关 键 词:子串归并  独立性统计  分词碎片
收稿时间:2009-02-10
修稿时间:2009-4-1 

Substring reduction algorithm based on independence statistic
ZHOU Lang,FENG Chong,HUANG He-yan,WANG Ping-yao. Substring reduction algorithm based on independence statistic[J]. Computer Engineering and Applications, 2010, 46(24): 129-131. DOI: 10.3778/j.issn.1002-8331.2010.24.039
Authors:ZHOU Lang  FENG Chong  HUANG He-yan  WANG Ping-yao
Affiliation:1.School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China 2.Research Center of Computer & Language Information Engineering,Chinese Academy of Sciences,Beijing 100097,China 3.Department of Computer,Ningbo Polytechnic,Ningbo,Zhejiang 315800,China
Abstract:The substring reduction algorithm applied in most cases is mainly focusing on the substrings having the same frequency with the parent string in one to one mode.After being processed by the morphological analysis tool,it's unavoidable to product many segment fragments which compose many meaningless substrings.According to the analysis of the one to multiple relationship between the meaningless substring and its parent strings,a substring reduction algorithm based on independence statistic is proposed to filter these meaningless substrings.Finally,this substring reduction algorithm is applied in the Chinese multi-words terminology extraction system,and the precision of the term extraction results is improved from 91.3% to 93.32%.
Keywords:substring reduction  independence statistic  segmentation fragment
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号