首页 | 本学科首页   官方微博 | 高级检索  
     


Query-Sensitive Similarity Measures for Information Retrieval
Authors:Anastasios?Tombros  author-information"  >  author-information__contact u-icon-before"  >  mailto:tombrosa@dcs.gla.ac.uk"   title="  tombrosa@dcs.gla.ac.uk"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author,C.J.?van Rijsbergen
Affiliation:(1) Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, Scotland, UK
Abstract:The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the cluster hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other and therefore tend to appear in the same clusters. In this paper we propose an axiomatic view of the hypothesis by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other that is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Our research describes an attempt to devise means by which this similarity can be detected. We propose the use of query-sensitive similarity measures that bias interdocument relationships toward pairs of documents that jointly possess attributes expressed in a query. We experimentally tested three query-sensitive measures against conventional ones that do not take the query into account, and we also examined the comparative effectiveness of the three query-sensitive measures. We calculated interdocument relationships for varying numbers of top-ranked documents for six document collections. Our results show a consistent and significant increase in the number of relevant documents that become nearest neighbors of any given relevant document when query-sensitive measures are used. These results suggest that the effectiveness of a cluster-based information retrieval system has the potential to increase through the use of query-sensitive similarity measures.
Keywords:Information retrieval  Document clustering  Similarity measures  Nearest neighbor searching
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号