基于语义词典和词频信息的文本相似度计算 Text Similarity Calculation Based on Semantic Dictionary and Word Frequency Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语义词典和词频信息的文本相似度计算

引用本文：	董苑,钱丽萍.基于语义词典和词频信息的文本相似度计算[J].计算机科学,2017,44(Z11):422-427.

作者姓名：	董苑钱丽萍

作者单位：	浙江工业大学计算机科学与技术学院杭州310023,浙江工业大学计算机科学与技术学院杭州310023

摘要：	为了克服传统的文本相似算法缺乏综合考虑语义理解和词语出现频率的缺点,在基于语义词典的词语相似度计算的基础上,提出了一种基于语义词典和词频信息的文本相似度(TSSDWFI)算法。通过计算两文本词语间的扩展相似度,找出文本词语间最大的相似度配对,从而计算出文本间的相似度。这种相似度计算方法利用语义词典,既考虑了不同文本间词语的相似度关系,又考虑了词语在各自文本中的词频高低。实验结果表明,与传统的语义算法和基于空间向量的文本相似度计算方法相比,TSSDWFI算法计算的文本相似度的准确度有了进一步提高。
关键词：	文本挖掘文本相似度语义词典关键词词频
Text Similarity Calculation Based on Semantic Dictionary and Word Frequency Information

DONG Yuan and QIAN Li-ping.Text Similarity Calculation Based on Semantic Dictionary and Word Frequency Information[J].Computer Science,2017,44(Z11):422-427.

Authors:	DONG Yuan and QIAN Li-ping

Affiliation:	Department of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023,China and Department of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023,China

Abstract:	Considering the drawbacks of semantic understanding and frequent word appearance,this paper proposed a text similarity algorithm based on semantic dictionary and word frequency information,referred to as TSSDWFI.In particular,the proposed algorithm aims at evaluating the similarity between two texts by calculating the expanded similarity between any two words in texts and the maximum similarity matching between text words.The proposed algorithm adopts semantic dictionary to calculate similarity between texts and takes into account the similarity relationship between different words and the frequency of word appearance in the text.Simulation results show that,compared with the existing algorithms,the proposed algorithm TSSDWFI has higher accuracy.

Keywords:	Text mining Text similarity Semantic dictionary Keywords Word frequency

	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏