首页 | 本学科首页   官方微博 | 高级检索  
     

基于键值存储的分布式时序相似性搜索方法
引用本文:俞自生,李瑞远,郭阳,蒋忠元,鲍捷,郑宇. 基于键值存储的分布式时序相似性搜索方法[J]. 软件学报, 2022, 33(3): 950-967
作者姓名:俞自生  李瑞远  郭阳  蒋忠元  鲍捷  郑宇
作者单位:西安电子科技大学网络与信息安全学院,陕西西安710126;北京京东智能城市大数据研究院,北京100176;重庆大学计算机学院,重庆400044;北京京东智能城市大数据研究院,北京100176;北京航空航天大学计算机学院,北京100191;西安电子科技大学网络与信息安全学院,陕西西安710126;北京京东智能城市大数据研究院,北京100176
基金项目:国家重点研发计划(2019YFB2103201);国家自然科学基金(61976168,62076191,61502375)
摘    要:时序相似性搜索是时序数据分析最基本的操作之一,具有广泛的应用场景.针对现有分布式算法无法应对维度增长、扫描范围过大和相似性计算耗时的问题,提出一种面向键值存储的分布式时序相似性搜索方法KV-Search.首先对时序数据分块,并设计其键值存入键值数据库,解决了时序数据维度高且不断增长的问题;其次,基于切比雪夫距离计算其下...

关 键 词:时间序列  相似性搜索  键值存储  剪枝过滤  分布式查询
收稿时间:2021-06-30
修稿时间:2021-07-31

Distributed Time Series Similarity Search Method Based on Key-value Data Stores
YU Zi-Sheng,LI Rui-Yuan,GUO Yang,JIANG Zhong-Yuan,BAO Jie,ZHENG Yu. Distributed Time Series Similarity Search Method Based on Key-value Data Stores[J]. Journal of Software, 2022, 33(3): 950-967
Authors:YU Zi-Sheng  LI Rui-Yuan  GUO Yang  JIANG Zhong-Yuan  BAO Jie  ZHENG Yu
Affiliation:School of Cyber Engineering, Xidian University, Xi''an Shaanxi 710071, China;JD Intelligent Cities Research, Beijing 100176, China;College of Computer Science, Chongqing University, Chongqing 400044, China;JD Intelligent Cities Research, Beijing 100176, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Abstract:Time series similarity search is one of the most basic operations for temporal data analysis, which has various application scenarios. Existing distributed methods face the problems of dimension explosion, too large scan range, and time-consuming similarity calculation. To this end, this paper proposes a novel distributed time series similarity search algorithm KV-Search. First, time series are segmented into blocks and stored in the key-value database, which solves the problem of high and growing dimension. Second, the lower bound is calculated based on Chebyshev distance, and the invalid data is filtered out in advance using key value range scanning, which reduce the data transmission and calculation overhead. Third, a block-based time series representation is used to calculate the lower bound of distance, which avoids the calculation of higher dimensional real data. We implement KV-Search based on HBase, and conduct a set of extensive experiments using both real and synthetic time series data. The results show that our KV-Search is superior to benchmark experiment in efficiency and scalability.
Keywords:time series  similarity search  key-value storage  pruning filtration  distributed query
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号