首页 | 本学科首页   官方微博 | 高级检索  
     

融合词向量与位置信息的关键词提取算法
引用本文:樊玮,刘欢,张宇翔. 融合词向量与位置信息的关键词提取算法[J]. 计算机工程与应用, 2020, 56(5): 179-185. DOI: 10.3778/j.issn.1002-8331.1811-0304
作者姓名:樊玮  刘欢  张宇翔
作者单位:中国民航大学 计算机科学与技术学院,天津 300300
摘    要:针对现有的基于图的关键词提取方法未能有效整合文本序列中词与词之间的潜在语义关系的问题,提出了一个融合词向量与位置信息的基于图的关键词提取算法EPRank。通过词向量表示模型学得目标文档中每个词的表示向量;将该反映词与词之间的潜在语义关系的词向量与位置特征相结合融合到PageRank评分模型中;选择几个排名靠前的单词或短语作为目标文档的关键词。实验结果表明,提出的EPRank方法在KDD和SIGIR两个数据集上的各项评估指标均高于5个现有的关键词提取方法。

关 键 词:关键词提取  词向量  位置信息  PageRank算法  

Keyphrase Extraction Algorithm Integrating Word Embeddings and Position Information
FAN Wei,LIU Huan,ZHANG Yuxiang. Keyphrase Extraction Algorithm Integrating Word Embeddings and Position Information[J]. Computer Engineering and Applications, 2020, 56(5): 179-185. DOI: 10.3778/j.issn.1002-8331.1811-0304
Authors:FAN Wei  LIU Huan  ZHANG Yuxiang
Affiliation:School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
Abstract:Focused on the issue that the existing graph-based keyphrase extraction methods fail to integrate the potential semantic relationship among words in text sequences,a graph-based keyphrase extraction algorithm EPRank that integrates word embeddings and position information is proposed.First,the word embedding of each word in the target document is learned by the word embedding representation model.Secondly,the word embeddings which reflect the potential semantic relationship among words and position information are combined into the PageRank scoring model.Finally,it selects a few top-ranked words or phrases as keyphrases for the target document.The experimental results show that the proposed algorithm EPRank can achieve higher values in terms of every evaluation metric on KDD and SIGIR datasets than the five existing keyphrase extraction methods.
Keywords:keyphrase extraction  word embedding  position information  PageRank algorithm
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号