首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
汤春蕾  董家麒 《计算机学报》2012,35(11):2228-2236
子序列的相似性查询是时间序列数据集中的一种重要操作,包括范围查询和k近邻查询.现有的大多算法是基于欧几里德距离或者DTW距离的,缺点在于查询效率低下.文中提出了一种新的基于LSH的距离度量方法,可以在保证查询结果质量的前提下,极大提高相似性查询的效率;在此基础上,给出一种DS-Index索引结构,利用距离下界进行剪枝,进而还提出了两种优化的OLSH-Range和OLSH-kNN算法.实验是在真实的股票序列集上进行的,数据结果表明算法能快速精确地找出相似性查询结果.  相似文献   

2.
针对时间序列相似性度量中欧氏距离对异常数据敏感以及DTW距离算法效率低的问题,提出基于滑动平均与分段线性回归的时间序列相似性方法。首先,使用初始可变滑动平均算法以及分段线性回归对原始时间序列进行数据变换,并将分段线性回归的参数(截距与距离)集作为时间序列的特征,以实现时间序列的特征提取和数据降维;然后,利用动态时间弯曲距离进行距离计算。该方法在时间序列相似性上与DTW算法的性能相近,但是在算法效率上几乎提高了96%。实验结果验证了该方法的有效性与准确性。  相似文献   

3.
传统DTW算法复杂度高,特别当处理海量数据时,耗时长.为了从算法和实现手段两方面同时入手,提高DTW运算效率,提出基于Hadoop平台,以FastDTW方法实现的水文时间序列相似性查找方法.首先利用小波变换对数据去噪,接着对水文时间序列进行语义化,然后在Hadoop的MapReduce过程中调用FastDTW方法实现DTW距离的云计算,得出与查询序列最相似的匹配序列.通过实验与串行查找进行对比,验证该方法用时短,匹配效果好,能够满足实际应用需求.  相似文献   

4.
针对基于u-shapelets的时间序列聚类中u-shapelets集合质量较低的问题,提出一种基于最佳u-shapelets的时间序列聚类算法DivUshapCluster。首先,探讨不同子序列质量评估方法对基于u-shapelets的时间序列聚类结果的影响;然后,选用最佳的子序列质量评估方法对u-shapelet候选集进行质量评估;其次,引入多元top-k查询技术对u-shapelet候选集进行去除冗余操作,搜索出最佳的u-shapelets集合;最后,利用最佳u-shapelets集合对原始数据集进行转化,达到提高时间序列聚类准确率的目的。实验结果表明,DivUshapCluster算法在聚类准确度上不仅优于经典的时间序列聚类算法,而且与BruteForce算法和SUSh算法相比,DivUshapCluster算法在22个数据集上的平均聚类准确度分别提高了18.80%和19.38%。所提算法能够在保证整体效率的情况下有效提高时间序列的聚类准确度。  相似文献   

5.
基于极值点特征的时间序列相似性查询方法*   总被引:4,自引:2,他引:2  
为了提高时间序列子序列匹配的准确度和效率,提出了基于极值点特征的时间序列相似性查询方法。首先识别出时间序列中的极值特征点,根据极值点使用多层次极值划分法对长序列进行划分;然后对划分得到的多层次子序列集使用改进的动态时间弯曲方法与查询序列进行相似性匹配;最后找到与查询序列最相似的子序列。实验表明,此方法在保证准确度的情况下大大提高了相似性搜索过程的效率。  相似文献   

6.
提出一种在时间序列上快速匹配子序列的算法,该算法不同于FRM算法,而是采用VA-file这种索引结构,将数据点直接存储在索引上,并在该索引的基础上设计了一种进行范围查询的方法.实验采用了三种时间序列数据集,从不同的角度验证算法的有效性,结果表明该算法大大提高了查询性能.  相似文献   

7.
杨艳林  叶枫  吕鑫  余霖  刘璇 《计算机科学》2016,43(2):245-249
水文时间序列相似性挖掘是水文时间序列挖掘的重要方面,对洪水预报、防洪调度等具有重要意义。针对水文数据的特点,提出了一种基于DTW聚类的水文时间序列相似性挖掘方法。该方法先对数据进行小波去噪、特征点分段以及语义划分,再基于DTW距离对划分后的子序列做层次聚类并符号化;然后根据符号序列间的编辑距离筛选候选集;最后通过序列间的DTW距离进行精确匹配,获取相似水文时间序列。以滁河六合站的日水位数据进行实验,结果表明,所提方法能够有效地缩小候选集,提高查找语义相似的水文时间序列的效率。  相似文献   

8.
标签图常用于智能交通网、生物信息网等新兴领域的建模。子图查询作为图数据分析的关键问题,引起了研究者的广泛关注。对现有子图查询算法的研究发现,随着图数据规模增大且频繁更新,传统子图查询算法普遍存在查询效率低,存储开销大,忽略顶点标签信息等问题。为此,提出了一种支持大规模动态标签图子图查询的层次序列索引(Dynamic Hierarchical Sequence,DHS),该索引提取数据图中带有顶点编号的层次拓扑序列关系以实现子图查询;针对图的动态变化,提出了更新点拓扑扩展式索引维护策略,仅从局部变化顶点及边开始进行增量式更新,大大降低了重建索引造成的巨大开销;提出了基于DHS索引的子图查询方法,仅需将查询图与数据图的层次序列进行匹配即可获得候选集,并在其上利用关系匹配策略获得最终查询结果。实验证明提出的方法在保证高效查询的同时降低了索引的创建及维护时间,提高了子图查询效率。  相似文献   

9.
时间序列形态相似性挖掘是目前时间序列数据挖掘研究的热点,然而由于时间序列数据背后真实系统的复杂性,加上观测条件的影响,时间序列会呈现多种相似性变形,如振幅伸缩、振幅漂移、线性漂移等。相似性变形并不会改变序列的形态特征,但现有的ED、DTW和Lp距离等相似性度量算法均不能有效支持识别各类相似性变形。本文首次提出涨落模式(FP)的概念,以涨落模式保存原序列的趋势变化信息,利用最长公共子序列算法计算涨落模式的相似度,消除振幅伸缩、振幅漂移和线性漂移等对相似性挖掘带来的影响,实现基于涨落模式的时间序列相似性度量。设置仿真数据集检验FP相似性度量的相似性变形支持性,同时在真实数据集上进行分类,依据分类准确性对算法鲁棒性进行评估,验证了本文提出的基于涨落模式的相似性度量算法在各类相似性形变上的有效支持性。  相似文献   

10.
刘帅  刘长良  甄成刚 《计算机应用》2019,39(4):1229-1233
针对风电机组故障预警中,原始动态时间规整(DTW)算法无法有效度量风电机组多变量时间序列数据之间距离的问题,提出一种基于犹豫模糊集的动态时间规整(HFS-DTW)算法。该算法是原始DTW算法的一种扩展算法,可对单变量和多变量时间序列数据进行距离度量,且精度与速度较原始DTW算法更优。以子时间序列相似度距离为目标函数,使用帝国竞争算法(ICA)优化了HFS-DTW算法中的子序列长度和步距参数。算例研究表明与仅DTW算法和非参数最优的HFS-DTW算法相对比,参数最优的HFS-DTW可挖掘更多的多维特征点信息,输出的多维特征点相似序列具有更丰富细节;且基于所提算法可提前10天预警风电机组齿轮箱故障。  相似文献   

11.
针对时间序列子序列聚类存在的平凡相似和水平伸缩等问题,提出了一种新的子序列聚类算法。它采用多孔平滑滤波器组对时间序列进行低通平滑处理,在所得到的多个尺度序列上生成平凡簇,然后将各个平凡簇的代表子序列作为数据样本进行聚类。新方法利用平凡簇克服了子序列聚类中的平凡相似问题,并且可以在时间序列上发现不等长的相似子序列,较好地解决了水平轴伸缩问题。实验结果证明新算法对于子序列聚类具有比较好的效果。  相似文献   

12.
With the growing demand for visual information of rich content, effective and efficient manipulations of large video databases are increasingly desired. Many investigations have been made on content-based video retrieval. However, despite the importance, video subsequence identification, which is to find the similar content to a short query clip from a long video sequence, has not been well addressed. This paper presents a graph transformation and matching approach to this problem, with extension to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, maximum size matching is deployed for each subgraph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, sub-maximum similarity matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. The performance studies conducted on a long video recording of 50 hours validate that our approach is promising in terms of both search accuracy and speed.  相似文献   

13.
Motion capture data digitally represent human movements by sequences of body configurations in time. Subsequence searching in long sequences of such spatio-temporal data is difficult as query-relevant motions can vary in execution speeds and styles and can occur anywhere in a very long data sequence. To deal with these problems, we employ a fast and effective similarity measure that is elastic. The property of elasticity enables matching of two overlapping but slightly misaligned subsequences with a high confidence. Based on the elasticity, the long data sequence is partitioned into overlapping segments that are organized in multiple levels. The number of levels and sizes of overlaps are optimized to generate a modest number of segments while being able to trace an arbitrary query. In a retrieval phase, a query is always represented as a single segment and fast matched against segments within a relevant level without any costly post-processing. Moreover, visiting adjacent levels makes possible subsequence searching of time-warped (i.e., faster or slower executed) queries. To efficiently search on a large scale, segment features can be binarized and segmentation levels independently indexed. We experimentally demonstrate effectiveness and efficiency of the proposed approach for subsequence searching on a real-life dataset.  相似文献   

14.
The search for similar subsequences is a core module for various analytical tasks in sequence databases. Typically, the similarity computations require users to set a length. However, there is no robust means by which to define the proper length for different application needs. In this study, we examine a new query that is capable of returning the longest-lasting highly correlated subsequences in a sequence database, which is particularly helpful to analyses without prior knowledge regarding the query length. A baseline, yet expensive, solution is to calculate the correlations for every possible subsequence length. To boost performance, we study a space-constrained index that provides a tight correlation bound for subsequences of similar lengths and offset by intraobject and interobject grouping techniques. To the best of our knowledge, this is the first index to support a normalized distance metric of arbitrary length subsequences. In addition, we study the use of a smart cache for disk-resident data (e.g., millions of sequence objects) and a graph processing unit-based parallel processing technique for frequently updated data (e.g., nonindexable streaming sequences) to compute the longest-lasting highly correlated subsequences. Extensive experimental evaluation on both real and synthetic sequence datasets verifies the efficiency and effectiveness of our proposed methods.  相似文献   

15.
Video similarity matching has broad applications such as copyright detection, news tracking and commercial monitoring, etc. Among these applications, one typical task is to detect the local similarity between two videos without the knowledge on positions and lengths of each matched subclip pair. However, most studies so far on video detection investigate the global similarity between two short clips using a pre-defined distance function. Although there are a few works on video subsequence detection, all these proposals fail to provide an effective query processing mechanism. In this paper, we first generalize the problem of video similarity matching. Then, a novel solution called consistent keyframe matching (CKM) is proposed to solve the problem of subsequence matching based on video segmentation. CKM is designed with two goals: (1) good scalability in terms of the query sequence length and the size of video database and (2) fast video subsequence matching in terms of processing time. Good scalability is achieved by employing a batch query paradigm, where keyframes sharing the same query space are summarized and ordered. As such, the redundancy of data access is eliminated, leading to much faster video query processing. Fast subsequence matching is achieved by comparing the keyframes of different video sequences. Specifically, a keyframe matching graph is first constructed and then divided into matched candidate subgraphs. We have evaluated our proposed approach over a very large real video database. Extensive experiments demonstrate the effectiveness and efficiency of our approach.  相似文献   

16.
为了减少噪声数据对查询最优序列的影响,避免Euclidean距离对形态的敏感性,以及要求序列等长的缺点,提出了面向噪声数据的时间序列相似性搜索算法.运用SPC方法去除序列中的噪声数据;采用DTW距离作为度量函数,使用规范化方法使序列处于相同的分辨率下;采用LB_ Keogh下界函数对候选序列集合进行筛选.仿真实验结果表明,该算法在阈值较小时,对含有噪声数据序列的匹配能力较强.  相似文献   

17.
文中所提m子序列是根据m序列的状态转换特征,通过交叉改变状态转换次序而形成新的序列。通过随机性测试软件(NIST)验证m子序列具有和m序列相似的随机性,使用BM算法可以得出这种伪随机序列具有非常高的线性复杂度,同时验证了其补序列也具有非常高的线性复杂度,并说明m子序列具有良好的线性复杂度谱,抗线性攻击能力强。m子序列的数量庞大,一个周期为 的m序列,改变反馈函数就可以至少产生 个m子序列。产生m子序列的反馈函数经证明具有良好的代数免疫度,抗代数攻击能力较强。m子序列具有良好的密码学性质,应用前景良好。  相似文献   

18.
Subsequence matching is a basic problem in the field of data stream mining. In recent years, there has been significant research effort spent on efficiently finding subsequences similar to a query sequence. Another challenging issue in relation to subsequence matching is how we identify common local patterns when both sequences are evolving. This problem arises in trend detection, clustering, and outlier detection. Dynamic time warping (DTW) is often used for subsequence matching and is a powerful similarity measure. However, the straightforward method using DTW incurs a high computation cost for this problem. In this paper, we propose a one-pass algorithm, CrossMatch, that achieves the above goal. CrossMatch addresses two important challenges: (1) how can we identify common local patterns efficiently without any omission? (2) how can we find common local patterns in data stream processing? To tackle these challenges, CrossMatch incorporates three ideas: (1) a scoring function, which computes the DTW distance indirectly to reduce the computation cost, (2) a position matrix, which stores starting positions to keep track of common local patterns in a streaming fashion, and (3) a streaming algorithm, which identifies common local patterns efficiently and outputs them on the fly. We provide a theoretical analysis and prove that our algorithm does not sacrifice accuracy. Our experimental evaluation and case studies show that CrossMatch can incrementally discover common local patterns in data streams within constant time (per update) and space.  相似文献   

19.
孙焘  朱晓明 《计算机科学》2017,44(2):270-274
多条序列的最长公共子序列可以代表多条序列的公共信息,其在诸多领域里有着重要的应用,如信息检索、基因序列匹配等。求解多条序列的最长公共子序列是著名的NP难问题,本质为多解问题。一些近似算法虽然时间复杂度较低,但只能求出单解,对于有多解的序列集合,求得的结果信息量损失较大。因此提出一个新的近似算法来解决最长公共子序列问题。算法引入了代数结构“格”,通过动态规划求解出两条序列的公共格,并递归求解当前格与当前序列的公共格。公共格中的路径保存了多条公共子序列使得最终求解出的最长公共子序列为多个。对算法的相关定理给出了理论证明,并通过实验验证了算法的正确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号