首页 | 本学科首页   官方微博 | 高级检索  
     


A geometric framework for data fusion in information retrieval
Affiliation:1. CAPE-Lab: Computer-Aided Process Engineering Laboratory, Deparment of Industrial Engineering, University of Padova, via Marzolo 9, 35131, Padova, Italy;2. CPSE, Centre for Process Systems Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ United Kingdom;1. University of Basilicata, Potenza, Italy;2. Ecole Polytech. de Montreal, Montreal, QC, Canada;3. Independent Researcher, Trento, Italy
Abstract:Data fusion in information retrieval has been investigated by many researchers and a number of data fusion methods have been proposed. However, problems such as why data fusion can increase effectiveness and favorable conditions for the use of data fusion methods are poorly resolved at best. In this paper, we formally describe data fusion under a geometric framework, in which each component result returned from an information retrieval system for a given query is represented as a point in a multi-dimensional space. The Euclidean distance is the measure by which the effectiveness and similarity of search results are judged. This allows us to explain all component results and fused results using geometrical principles. In such a framework, score-based data fusion becomes a deterministic problem. Several interesting features of the centroid-based data fusion method and the linear combination method are discussed. Nevertheless, in retrieval evaluation, ranking-based measures are the most popular. Therefore, this paper investigates the relation and correlation between the Euclidean distance and several typical ranking-based measures. We indeed find that a very strong correlation exists between these. It means that the theorems and observations obtained using the Euclidean distance remain valid when ranking-based measures are used. The proposed framework enables us to have a better understanding of score-based data fusion and use score-based data fusion methods more precisely and effectively in various ways.
Keywords:Database searching  Geometric modeling  Information retrieval  Data fusion
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号