首页 | 本学科首页   官方微博 | 高级检索  
     

结合词义的文本特征词权重计算方法
引用本文:李明涛,罗军勇,尹美娟,路林. 结合词义的文本特征词权重计算方法[J]. 计算机应用, 2012, 32(5): 1355-1358
作者姓名:李明涛  罗军勇  尹美娟  路林
作者单位:1. 信息工程大学 信息工程学院,郑州4500022. 信息工程大学 信息工程学院3. 信息工程大学 信息工程学院, 郑州 450002
摘    要:传统的基于向量空间模型的文本相似度计算方法,用TF-IDF计算文本特征词的权重,忽略了特征词之间的词义相似关系,不能准确地反映文本之间的相似程度。针对此问题,提出了结合词义的文本特征词权重计算方法,基于Chinese WordNet采用词义向量余弦计算特征词的词义相似度,根据词义相似度对特征词的TF-IDF权重进行修正,修正后的权重同时兼顾词频和词义信息。在哈尔滨工业大学信息检索研究室多文档自动文摘语料库上的实验结果表明,根据修正后的特征词权重计算文本相似度,能够有效地提高文本的类区分度。

关 键 词:文本相似度   特征词权重   词义相似度   Chinese WordNet
收稿时间:2011-11-04
修稿时间:2011-12-28

Weight computing method for text feature terms by integrating word sense
LI Ming-tao , LUO Jun-yong , YIN Mei-juar , LU Lin. Weight computing method for text feature terms by integrating word sense[J]. Journal of Computer Applications, 2012, 32(5): 1355-1358
Authors:LI Ming-tao    LUO Jun-yong    YIN Mei-juar    LU Lin
Affiliation:1. Institute of Information Engineering, Information Engineering University, Zhengzhou Henan 450002, China
2.
3. Institute of Information Engineering,Information Engineering University,Zhengzhou Henan 450002,China
Abstract:Most of the existing methods to compute text similarity based on Vector Space Model(VSM) use TF-IDF scores as the weights of feature terms in text,which ignores the word sense relationships among feature terms and lead to inaccurate text similarity.To improve the accuracy of text similarities calculated by methods based on VSM,a new term weight computing method by integrating word sense was proposed in this paper.Firstly,word sense similarities among feature terms were computed based on the Chinese WordNet.And then,the TF-IDF weights were revised according to the word sense similarities for the purpose of reflecting both the frequency and the word sense of feature terms in text.The experimental results on the HIT IR-lab Multi-Document Summarization Corpus show that to use the weights calculated by the proposed method can efficiently improve the differentiation among document clusters.
Keywords:text similarity  feature term weight  words sense similarity  Chinese WordNet
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号