首页 | 本学科首页   官方微博 | 高级检索  
     

基于word2vec的关键词提取算法
作者姓名:李跃鹏  金翠  及俊川
作者单位:1. 中国科学院计算机网络信息中心, 北京 100190; 2. 北京科技大学, 北京 100083; 3. 中国科学院大学, 北京 100049
摘    要:随着近些年深度学习的兴起,词语在计算机中的表示有了重大突破;而长期以来关键词提取算法均以词语作为特征进行计算,效果并不理想。因此,本文提出了一种基于深度学习工具word2vec的关键词提取算法。该算法首先使用word2vec将所有词语映射到一个更抽象的词向量空间中;然后基于词向量计算词语之间的相似度,最终通过词语聚类得到文章关键词。实验表明该算法对于篇幅长文章的关键词提取的准确率要明显高于其他算法。

关 键 词:word2vec  关键词提取  词向量  
收稿时间:2015-06-06

A Keyword Extraction Algorithm Based on Word2vec
Authors:Li Yuepeng  Jin Cui  Ji Junchuan
Affiliation:1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Science and Technology Beijing, Beijing 100083, China; 3. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:With the rapid development of deep learning, a major breakthrough has been made to the word representation of computers, while for a long time the keyword extraction algorithms is based on the feature of words, and it is not very ideal. In this paper, we present a keyword extraction algorithm based on word2vec, which is a well known tool for deep learning. Firstly, this algorithm projects all the words into a more abstract word vector space, then based on the word vectors, it calculates the similarity between words to cluster all the words in the target article, and the center of the cluster can be selected as the keyword. According the result of the experiment, this algorithm is better than other algorithms for long articles.
Keywords:word2vec  keyword extraction  word vector  
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号