Algorithms for the boundary selection problem |
| |
Authors: | J N Bhuyan J S Deogun V V Raghavan |
| |
Affiliation: | (1) Department of Computer Science, Tuskegee University, 36088 Tuskegee, AL, USA;(2) Department of Computer Science and Engineering, University of Nebraska-Lincoln, 68588-0115 Lincoln, NE, USA;(3) The Center for Advanced Computer Studies, University of Southwestern Louisiana, 70504-4330 Lafayatte, LA, USA |
| |
Abstract: | User-oriented clustering schemes enable the classification of documents based upon the user's perception of the similarity
between documents, rather than on some similarity function presumed by the designer to represent the user's criteria. In an
earlier paper it was shown that such a classification scheme can be developed in two stages. The first stage involves the
accumulation of relevance judgements provided by users,vis-à-vis past query instances, into a suitable structure. The second stage consists of cluster identification. When the structure
chosen, in the first stage, for the accumulation of corelevance characteristics of documents is a straight line, the second
stage can be formulated as a function optimization problem termed the Boundary Selection Problem (BSP). A branch-and-bound
algorithm with a good bounding function is developed for the BSP. Although significant pruning is achieved due to the bounding
function, the complexity is still high for a problem of a large size. For such a problem a heuristic that divides it into
a number of subproblems, each being solved by a branch-and-bound approach, is developed. Then the overall problem is mapped
to an integer knapsack problem and solved by the use of dynamic programming. The tradeoff between accuracy and complexity
can be controlled, giving the user a preference of one over the other. Assuming that the heuristic which divides the overall
problem introduces no errors and is given sufficient time, the branch and bound with dynamic programming (BBDP) approach will
converge to the optimal solution. Two other heuristic approaches, one with the application of a polynomial dynamic programming
algorithm and the other which works in a greedy way, are also proposed for the BSP and an experimental comparison of all these
approaches is provided. Experimental results indicate that all proposed algorithms show better performance compared with the
existing algorithm. |
| |
Keywords: | Branch-and-bound Clustering Dynamic programming Greedy algorithm Information retrieval system |
本文献已被 SpringerLink 等数据库收录! |
|