首页 | 本学科首页   官方微博 | 高级检索  
     

一种修正的向量空间模型在信息检索中的应用
引用本文:马晖男,吴江宁,潘东华. 一种修正的向量空间模型在信息检索中的应用[J]. 哈尔滨工业大学学报, 2008, 40(4): 666-669
作者姓名:马晖男  吴江宁  潘东华
作者单位:大连理工大学,系统工程研究所,大连,116024
基金项目:日本佳思腾株式会社资助项目
摘    要:为了提高文本信息检索系统检索性能,针对信息检索系统中普遍使用的向量空间模型(VSM)所固有的缺陷,提出一种新的修正的向量空间模型(MVSM).该模型重新定义了查询索引项的内容,将修饰词与中心词组成的合成短语引入到查询语句及传统的向量空间检索模型的信息表示中,并重新计算作为特征索引项的合成短语的权重值.在此基础上,又对查询索引项使用了基于同义词词典的查询扩展策略.实验结果表明:用合成短语作为查询索引项进行检索,使检索能够在相对精确的范围内进行,提高检索查准率;对查询进行同义扩展,能够使更多的语义相关的文本被检索出来,提高检索查全率.因此,在信息检索系统中应用修正的向量空间模型能够较好地改善检索性能.

关 键 词:文本信息检索  向量空间模型  同义词词典  查询扩展

Application of a modified vector space model in textual information retrieval systems
MA Hui-nan,WU Jiang-ning,PAN Dong-hua. Application of a modified vector space model in textual information retrieval systems[J]. Journal of Harbin Institute of Technology, 2008, 40(4): 666-669
Authors:MA Hui-nan  WU Jiang-ning  PAN Dong-hua
Affiliation:(Institute of Systems Engineering,Dalian University of Technology,Dalian 116024,China)
Abstract:To improve the efficiency of textual information retrieval systems,a new model named modified vector space model(MVSM) is proposed,which aims to the intrinsic limitations of the traditional vector space model(VSM).And in the new IR model,the integration of modification words and head words as a combined term was introduced into the representation of user queries and the traditional VSM.The way to calculate the weights of combined terms in vectors was presented as well.A new strategy for query expansion based on synonymy thesaurus was proposed in the new model.Experimental results show that by introducing of combined terms we can retrieve documents in a relatively narrow search space,and the retrieval precision is increased.Furthermore,by query expansion strategy we can extend the coverage of the retrieval to the related documents that do not necessarily contain the same terms as the given query,and the retrieval recall is increased.So applying the MVSM to the information retrieval system is capable of improving the retrieval performance both in precision and recall rates.
Keywords:textual information retrieval  vector space model  Synonymy Thesaurus  query expansion
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号