首页 | 本学科首页   官方微博 | 高级检索  
     

基于适应性分段估计的数据流相似性搜索
引用本文:吴 枫,仲 妍,吴泉源,贾 焰,杨树强.基于适应性分段估计的数据流相似性搜索[J].软件学报,2009,20(10):2867-2884.
作者姓名:吴 枫  仲 妍  吴泉源  贾 焰  杨树强
作者单位:国防科学技术大学,计算机学院,湖南,长沙,410073
基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant Nos.2006AA01Z451, 2007AA01Z474 (国家高技术研究发展计划(863))
摘    要:相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WFglobalLB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WFglobalLB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WFglobalLB_seg_WF可能会出现连续失效的情况,分别提出了DTW的下限函数LB_WFglobal(具有全局约束条件)和上限函数UB_WF、下限函数LB_WF(不具有全局约束条件).通过增量方式快速估计DTW,极大地减少了估计DTW的冗余计算量.通过理论分析和统计实验,验证了该方法的有效性.

关 键 词:相似性搜索  数据流  时间序列分析  动态时间扭曲
修稿时间:2008/12/30 0:00:00

Similarity Search in Data Stream with Adaptive Segmental Approximations
WU Feng,ZHONG Yan,WU Quan-Yuan,JIA Yan and YANG Shu-Qiang.Similarity Search in Data Stream with Adaptive Segmental Approximations[J].Journal of Software,2009,20(10):2867-2884.
Authors:WU Feng  ZHONG Yan  WU Quan-Yuan  JIA Yan and YANG Shu-Qiang
Abstract:Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WFglobal for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WFglobal and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WFglobal or LB_seg_WF in data stream, it is believed that lower-bound LB_WFglobal (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.
Keywords:similarity search  data stream  time series analysis  dynamic time warping
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号