首页 | 本学科首页   官方微博 | 高级检索  
     

基于位置标签与词性结合的组合词抽取方法
引用本文:欧阳柳波,周伟光.基于位置标签与词性结合的组合词抽取方法[J].计算机应用研究,2016,33(4).
作者姓名:欧阳柳波  周伟光
作者单位:湖南大学 信息科学与工程学院,湖南大学 信息科学与工程学院
基金项目:国家自然科学基金资助项目(61472132);湖南省产学研结合重大科技成果转化资助项目(2010XK6024);国家核高基重大专项资助项目(2012ZX01045-004-005-002)
摘    要:现有分词系统不能及时收录新词语,因而不能有效识别领域组合词。针对此问题,提出一种位置标签与词性相结合的组合词抽取方法。首先对语料进行文本预处理、添加位置标签、加权词频过滤等建立词条的位置标签集;然后依据位置标签集计算词条在句子中的相邻度判定组合词;最后制定反规则对抽取结果进行过滤,并对垃圾串进行两端逐步消减再判定进一步识别组合词。通过在不同语料库上进行实验,表明本方法具有更高的准确率。

关 键 词:组合词抽取  位置标签集  相邻度  反规则过滤  新词发现
收稿时间:2014/11/19 0:00:00
修稿时间:2016/2/22 0:00:00

Compound word extraction based on location tag and POS
OUYANG Liu-bo and ZHOU Weiguang.Compound word extraction based on location tag and POS[J].Application Research of Computers,2016,33(4).
Authors:OUYANG Liu-bo and ZHOU Weiguang
Affiliation:College of Computer Science and Electronic Engineering,Hunan University,
Abstract:Now existing segmentation systems cannot recruit new words timely, so they cannot identify compound words effectively. To solve that, this paper proposed a method of compound word extraction based on location tag and POS(part of speech). First, this method established location tag set for each item by processing corpus texts, adding location tag for each item and filtering items with weighted term frequency. Then it counted adjacent degree to judge compound words on the basis of location tag set. Finally, formulated reverse rules and filtered garbage strings with them, detected combined words further from garbage strings by removing item from the head and the tail. Experiments were carried out on different corpora, and the results show that this method has higher precision.
Keywords:compound word extraction  location tag set  adjacent degree  reverse rule filtering  new words detecting
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号