基于语言网络和语义信息的文本相似度计算 Text similarity calculation based on language network and semantic information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语言网络和语义信息的文本相似度计算

引用本文：	詹志建,杨小平.基于语言网络和语义信息的文本相似度计算[J].计算机工程与应用,2014,50(5):33-38.

作者姓名：	詹志建杨小平

作者单位：	中国人民大学信息学院计算机系，北京 100872

基金项目：	基金项目：国家自然科学基金（No.70871115）.

摘要：	通过分析已有的基于统计和基于语义分析的文本相似性度量方法的不足，提出了一种新的基于语言网络和词项语义信息的文本相似度计算方法。对文本建立语言网络，计算网络节点综合特征值，选取TOP比例特征词表征文本，有效降低文本表示维度。计算TOP比例特征词间的相似度，以及这些词的综合特征值所占百分比以计算文本之间的相似度。利用提出的相似度计算方法在数据集上进行聚类实验，实验结果表明，提出的文本相似度计算方法，在F-度量值标准上优于传统的TF-IDF方法以及另一种基于词项语义信息的相似度量方法。
关键词：	语言网络本聚类文本相似度词语相似度
Text similarity calculation based on language network and semantic information

ZHAN Zhijian,YANG Xiaoping.Text similarity calculation based on language network and semantic information[J].Computer Engineering and Applications,2014,50(5):33-38.

Authors:	ZHAN Zhijian YANG Xiaoping

Affiliation:	Department of Computer, School of Information, Renmin University of China, Beijing 100872, China

Abstract:	Aiming at the shotcoming of traditional text similarity methods with statistical information of word frequency and semantic information of word in text, it proposes a new text similarity calculation based on language network and word semantic information. This new method extracts feature items based on the feature values of the word nodes in a documental language network. It also considers both the importance of feaure items and the semantic relations among fea-ture items, and proposes to construct a semantic network of document feature items to calculate the similarity of docu-ments. Finally it uses several K-means clustering methods for evaluating preformance of the new text document similarity. Experimental results show that the method’s F-measure is superior to the others’which proves that the proposed method is effictive.

Keywords:	language network text clustering text similarity term semantic similarity
本文献已被 CNKI 维普等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏