首页 | 本学科首页   官方微博 | 高级检索  
     

基于Word2vec的句子语义相似度计算研究
引用本文:李晓,解辉,李立杰.基于Word2vec的句子语义相似度计算研究[J].计算机科学,2017,44(9):256-260.
作者姓名:李晓  解辉  李立杰
作者单位:安阳师范学院计算机与信息工程学院 安阳455002,清华大学计算机科学与技术系 北京100084,北京理工大学软件学院 北京100081
基金项目:本文受国家自然科学基金:面向甲骨学知识图谱的实体发现及语义关系挖掘研究(U1504612),河南省高等学校重点科研项目计划:基于语义向量空间模型的中文文本相似度计算研究(16A520037)资助
摘    要:word2vec利用深度学习的思想,可以从大规模的文本数据中自动学习数据的本质信息。因此,借助哈尔滨工业大学的LTP平台,设计利用word2vec模型将对句子的处理简化为向量空间中的向量运算,采用向量空间上的相似度表示句子语义上的相似度。此外,将句子的结构信息添加到句子相似度计算中,并就特殊句式对算法进行了改进,同时考虑到了词汇之间的句法关系。实验结果表明,该方法更准确地揭示了句子之间的语义关系,句法结构的提取和算法的改进解决了复杂句式的相似度计算问题,提高了相似度计算的准确率。

关 键 词:句子相似度  word2vec  词向量  语义  句法结构
收稿时间:2016/8/12 0:00:00
修稿时间:2016/12/24 0:00:00

Research on Sentence Semantic Similarity Calculation Based on Word2vec
LI Xiao,XIE Hui and LI Li-jie.Research on Sentence Semantic Similarity Calculation Based on Word2vec[J].Computer Science,2017,44(9):256-260.
Authors:LI Xiao  XIE Hui and LI Li-jie
Affiliation:School of Computer and Information Engineering,Anyang Normal University,Anyang 455002,China,Department of Computer Sciences and Technology,Tsinghua University,Beijing 100084,China and School of Software,Beijing Institute of Technology,Beijing 100081,China
Abstract:Using the idea of deep learning,word2vec can automatically learn the essential information of data from large-scale text data.Therefore,with the help of LTP platform of Harbin Institute of Technology,based on the word2vec model,the processing of the sentence is simplified as a vector in the vector space algorithm,and the similarity of vector space represents the sentence semantic similarity.In addition,the sentence structure information is added to the sentence similarity calculation,the algorithm are improved on the special sentence pattern,and the syntax relationship between words is taken into account.The experimental results show that this method is more accurately to reveal the semantic relations between sentences,syntactic structure and improved extraction algorithm also solve the problem of computing the similarity of complex sentences,finally improve the accuracy of the similarity calculation.
Keywords:Sentence similarity  Word2vec  Distributed representation  Semantic  Syntactic structure
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号