DDR: an index method for large time-series datasets |
| |
Affiliation: | 1. School of Information Technology, Faculty of Science and Technology, Deakin University, Melbourne Campus, Burwood, Victoria, Melbourne, 3125, Australia;2. Australia Research Council Centre in Bioinformatics, Melbourne, Australia;3. Institute of Information Sciences and Electronics, University of Tsukuba, 1-1-1, Tennodai, Tsukuba shi, Ibraki ken, Japan |
| |
Abstract: | The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|