首页 | 本学科首页   官方微博 | 高级检索  
     

专利新词发现的双向聚合度特征提取新方法
引用本文:陈梅婕,谢振平,陈晓琪,许鹏. 专利新词发现的双向聚合度特征提取新方法[J]. 计算机应用, 2020, 40(3): 631-637. DOI: 10.11772/j.issn.1001-9081.2019071193
作者姓名:陈梅婕  谢振平  陈晓琪  许鹏
作者单位:1. 江南大学 数字媒体学院, 江苏 无锡 214122;2. 江苏省媒体设计与软件技术重点实验室(江南大学), 江苏 无锡 214122;3. 常州佰腾科技有限公司, 江苏 常州 213164
基金项目:国家自然科学基金资助项目(61872166)。
摘    要:针对通用新词发现方法对专利长词识别效果不佳、专利术语词性搭配模板的灵活性不高,以及缺乏对中文专利长词识别的无监督方法的问题,提出了一种发现专利新词的双向聚合度特征提取新方法。首先,以词中组分的双向条件概率统计信息为基础,构造提出了一个二元词上的双向聚合度统计特征;其次,利用此特征扩展提出了词边界筛选规则;最后,基于新特征和词边界规则实现专利新词的提取。实验结果表明,新方法在整体F-测度值方面,与通用领域新词发现方法相比,提高了6.7个百分点,与两种最新的专利词性搭配模板方法相比,分别提高了19.2个百分点和17.2个百分点,并且较为显著地提高了4~8字专利新词发现的F-测度值。综合地,所提出的方法提升了专利新词发现性能,并且能够更有效地提取专利文本中具有复合形式的长词,同时可以减少对预先训练过程和额外复杂规则库的依赖,具备更好的实用性。

关 键 词:新词发现  双向聚合度  专利新词  特征提取  专利分析  
收稿时间:2019-07-10
修稿时间:2019-09-01

Novel bidirectional aggregation degree feature extraction method forpatent new word discovery
CHEN Meijie,XIE Zhenping,CHEN Xiaoqi,XU Peng. Novel bidirectional aggregation degree feature extraction method forpatent new word discovery[J]. Journal of Computer Applications, 2020, 40(3): 631-637. DOI: 10.11772/j.issn.1001-9081.2019071193
Authors:CHEN Meijie  XIE Zhenping  CHEN Xiaoqi  XU Peng
Affiliation:1. College of Digital Media, Jiangnan University, Wuxi Jiangsu 214122, China;2. Jiangsu Key Laboratory of Media Design and Software Technology(Jiangnan University), Wuxi Jiangsu 214122, China;3. Changzhou Baiteng Technology Company Limited, Changzhou Jiangsu 213164, China
Abstract:Aiming at the poor effect of general new word discovery method on the recognition of patent long words, the low flexibility of part of speech collocation template of patent terminology, and the lack of unsupervised methods for Chinese patent long word recognition, a novel bidirectional aggregation degree feature extraction method for patent new word discovery was proposed.Firstly, a bidirectional conditional probability was introduced on the statistical information between the first and last words on a double word term. Secondly, a word boundary filtering rule was extendedly introduced by using the above feature. Finally, new patent words were able to be extracted by combining the above aggregation degree feature and word boundary filtering rule. Experimental analysis show that, the new method improves the overall F-score by 6.7 percentage points compared with the new word discovery method in the general field, improves the overall F-score by 19.2 and 17.2 percentage points respectively compared with two latest patent terminology collocation template methods, and significantly increase the F-score for the discovery of new words with 4 to 8 characters. In summary, the proposed method greatly improves the performance of patent new word discovery, and can extract high compound long words in patent documents more effectively, while reducing the reliance on pre-training processes and extra complex rule base, with better practicality.
Keywords:new word discovery   bidirectional aggregation degree   patent new word   feature extraction   patent analysis
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号