Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM) |
| |
Authors: | Charalampos Konstantopoulos Basilis Mamalis Grammati Pantziou Damianos Gavalas |
| |
Affiliation: | (1) Department of Informatics, University of Piraeus and Research Academic Computer Technology Institute Patras, Piraeus, Greece;(2) Department of Informatics, Technological Educational Institution of Athens, Athens, Greece;(3) Department of Cultural Technology and Communication, University of the Aegean, Mytilene, Greece |
| |
Abstract: | In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval
and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize
and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best
of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For
all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby
formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval
phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval
(IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory
issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk
capacity for holding their local data.
|
| |
Keywords: | BSP model CGM model Parallel algorithms Text retrieval Document clustering External memory |
本文献已被 SpringerLink 等数据库收录! |
|