HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences |
| |
Authors: | Chung-Sheng Li Philip S Yu Vittorio Castelli |
| |
Affiliation: | 1. IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, USA
|
| |
Abstract: | In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|