首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于语义与句法结构的短文本相似度计算方法
引用本文:赵谦,荆琪,李爱萍,段利国.一种基于语义与句法结构的短文本相似度计算方法[J].计算机工程与科学,2018,40(7):1287-1294.
作者姓名:赵谦  荆琪  李爱萍  段利国
作者单位:(1.太原理工大学信息与计算机学院,山西 太原 030024;2.武汉大学软件工程国家重点实验室,湖北 武汉 430072)
基金项目:武汉大学软件工程国家重点实验室开放课题 (SKLSE2012 09 30);山西省自然科学基金(2013011015 2)
摘    要:为了提高短文本语义相似度计算的准确率,提出一种新的计算方法:将文本分割为句子单元,对句子进行句法依存分析,句子之间相似度计算建立在词语间相似度计算的基础上,在计算词语语义相似度时考虑词语的新特征——情感特征,并提出一种综合方法对词语进行词义消歧,综合词的词性与词语所处的语境,再依据Hownet语义词典计算词语语义相似度;将句子中词语之间的语义相似度根据句子结构加权平均得到句子的语义相似度,最后通过一种新的方法——二元集合法——计算短文本的语义相似度。词语相似度与短文本相似度的准确率分别达到了87.63%和93.77%。实验结果表明,本文方法确实提高了短文本语义相似度的准确率。

关 键 词:词义消歧  情感特征  句法依存分析  短文本语义相似度  
收稿时间:2016-12-12
修稿时间:2018-07-05

A short text similarity calculation method based on semantics and syntax structure
ZHAO Qian,JING Qi,LI Ai ping,DUAN Li guo.A short text similarity calculation method based on semantics and syntax structure[J].Computer Engineering & Science,2018,40(7):1287-1294.
Authors:ZHAO Qian  JING Qi  LI Ai ping  DUAN Li guo
Affiliation:(1.College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024; 2.State Key Laboratory of Software Engineering,Wuhan University,Wuhan 430072,China)
Abstract:In order to improve the accuracy of short text semantic similarity calculation, we propose a new calculation method. Firstly the short text is segmented to sentence units and we conduct syntactic dependency analysis. Similarity calculation between sentences is based on the similarity calculation between words. We then propose to take the emotional characteristics of the words into consideration when calculating semantic similarity, and put forward a comprehensive method for word sense disambiguation. Based on the parts of words and the context, we leverage the Hownet semantic dictionary to do word semantic similarity calculation. The semantic similarity of sentences is obtained by the weighted average of the semantic similarity between words in a sentence according to sentence structures. Finally we calculate the semantic similarity of short texts through a new method called binary set . Experimental results show that the accuracy of word similarity and short text similarity reaches 87.63% and 93.77% respectively, which demonstrates the improvement in the accuracy of semantic similarity.
Keywords:word sense disambiguation  emotional characteristic  syntactic dependency analysis  short text semantic similarity  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号