Efficient processing of similarity search under time warping in sequence databases: an index-based approach期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Efficient processing of similarity search under time warping in sequence databases: an index-based approach

Affiliation:	1. Department of Biology and Institute of Biochemistry, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1S 5B6, Canada;2. Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Campus Isla Teja, Valdivia, Chile;3. CSIRO Oceans & Atmosphere, GPO Box 1538, Hobart 7001, Tasmania, Australia;1. Robotics and Mechatronics Research Laboratory, Department of Mechanical and Aerospace Engineering, Monash University, Melbourne, Australia;2. College of Computer and Control Engineering, Nankai University, Tianjin, China;3. Division Microrobotics and Control Engineering (AMiR), Univerity of Oldenburg, Oldenburg, Germany

Abstract:	This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size.In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, D_tw?lb, which consistently underestimates the time warping distance and satisfies the triangular inequality. D_tw?lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and D_tw?lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏