期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Haar wavelets for efficient similarity search of time-series: with and without time warping 总被引：9，自引：0，他引：9

Chan F.K.-P. Fu A.W.-C. Yu C. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(3):686-705

We address the handling of time series search based on two important distance definitions: Euclidean distance and time warping distance. The conventional method reduces the dimensionality by means of a discrete Fourier transform. We apply the Haar wavelet transform technique and propose the use of a proper normalization so that the method can guarantee no false dismissal for Euclidean distance. We found that this method has competitive performance from our experiments. Euclidean distance measurement cannot handle the time shifts of patterns. It fails to match the same rise and fall patterns of sequences with different scales. A distance measure that handles this problem is the time warping distance. However, the complexity of computing the time warping distance function is high. Also, as time warping distance is not a metric, most indexing techniques would not guarantee any false dismissal. We propose efficient strategies to mitigate the problems of time warping. We suggest a Haar wavelet-based approximation function for time warping distance, called Low Resolution Time Warping, which results in less computation by trading off a small amount of accuracy. We apply our approximation function to similarity search in time series databases, and show by experiment that it is highly effective in suppressing the number of false alarms in similarity search. 相似文献

2.

DTW距离的过滤搜索方法

李正欣郭建胜王瑛田舢张晓丰李超《控制与决策》2018,33(7):1277-1281

动态时间弯曲(DTW)距离支持时间序列的多种形变,具有较高的匹配精度,是一种重要的相似性度量方法.然而,该方法计算复杂度较高,制约了其在相似性搜索中的应用.为了平衡匹配精度与计算效率之间的矛盾,提出一种过滤搜索方法.首先,构造一种计算代价较低的DTW下界距离,用其进行粗略过滤,得到候选集;然后,利用提前终止策略,优化计算候选集中序列的DTW距离,得到搜索结果;最后,对所提出方法进行实验验证,结果表明,该方法能够提高DTW距离的相似性搜索效率,且具有非漏报性. 相似文献

3.

Distance approximation techniques to reduce the dimensionality for multimedia databases 总被引：1，自引：1，他引：0

Yongkwon Kim Chin-Wan Chung Seok-Lyong Lee Deok-Hwan Kim 《Knowledge and Information Systems》2011,28(1):227-248

Recently, databases have been used to store multimedia data such as images, maps, video clips, and music clips. In order to search them, they should be represented by various features, which are composed of high-dimensional vectors. As a result, the dimensionality of data is increased considerably, which causes ‘the curse of dimensionality’. The increase of data dimensionality causes poor performance of index structures. To overcome the problem, the research on the dimensionality reduction has been conducted. However, some reduction methods do not guarantee no false dismissal, while others incur high computational cost. This paper proposes dimensionality reduction techniques that guarantee no false dismissal while providing efficiency considerable by approximating distances with a few values. To provide the no false dismissal property, approximated distances should always be smaller than original distances. The Cauchy–Schwarz inequality and two trigonometrical equations are used as well as the dimension partitioning technique is applied to approximate distances in such a way to reduce the difference between the approximated distance and the original distance. As a result, the proposed techniques reduce the candidate set of a query result for efficient query processing. 相似文献

4.

Efficient indexing of interval time sequences

Jong-Won Roh 《Information Processing Letters》2008,109(1):1-12

Time sequences, which are ordered sets of observations, have been studied in various database applications. In this paper, we introduce a new class of time sequences where each observation is represented by an interval rather than a number. Such sequences may arise in many situations. For instance, we may not be able to determine the exact value at a time point due to uncertainty or aggregation. Such observation may be represented better by a range of possible values. Similarity search with interval time sequences as both query and data sequences poses a new challenge for research. We first address the issue of (dis)similarity measures for interval time sequences. We choose an L₁ norm-based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals, and is invariant irrespective of the position of an interval when it is enclosed within another interval. We next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases. More specifically, we propose: (1) to extract a segment-based feature vector for each sequence, and (2) to map each feature vector to either a point or a hyper-rectangle in a multi-dimensional feature space. We then show how we can use existing multi-dimensional index structures such as the R-tree for efficient query processing. The proposed method guarantees no false dismissals. Experimental results show that, for synthetic and real stock data, it is superior to sequential scanning in performance and scales well with the data size. 相似文献

5.

cTraj: efficient indexing and searching of sequences containing multiple moving objects

Zaher Al Aghbari 《Journal of Intelligent Information Systems》2012,39(1):1-28

Indexing sequences containing multiple moving objects by all features of these objects captured at every clock tick results in huge index structures due to the large number of extracted features in all sampled instances. Thus, the main problems with current systems that index sequences containing multiple moving objects are: huge storage requirements for index structures, slow search time and low accuracy due to lack of representation of the time-varying features of objects. In this paper, a technique called cTraj to address these problems is proposed. For each object in a sequence, cTraj captures the features at sampled instances. Then, it maps the object??s features at each sampled instance from high-dimensional feature space into a point in low-dimensional distance space. The sequence of points of an object in low-dimensional space is considered the time-varying feature trajectory of the object. To reduce storage requirements of an index structure, the sequence of points in each trajectory is represented by a minimum bounding box (MBB). cTraj indexes a sequence by the MBBs of its objects using a spatial access method (SAM), such as an R?tree; thus, greatly reducing storage requirements of the index and speeding up the search time. The cTraj technique does not result in any false dismissal, but the result might contain a few false alarms, which are removed by a two-step refinement process. The experiments show that the proposed cTraj technique produces effective results comparable to those of a sequential method, however much more efficient. 相似文献

6.

基于适应性分段估计的数据流相似性搜索

吴枫仲妍吴泉源贾焰杨树强《软件学报》2009,20(10):2867-2884

相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WF_global和LB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WF_global和LB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WF_global或LB_seg_WF可能会出现连续失效的情况,分别提出了DTW的下限函数LB_WF_global(具有全局约束条件)和上限函数UB_WF、下限函数LB_WF(不具有全局约束条件).通过增量方式快速估计DTW,极大地减少了估计DTW的冗余计算量.通过理论分析和统计实验,验证了该方法的有效性. 相似文献

7.

Fast and Exact Warping of Time Series Using Adaptive Segmental Approximations

Yutao Shou Nikos Mamoulis David W. Cheung 《Machine Learning》2005,58(2-3):231-267

Similarity search is a core module of many data analysis tasks, including search by example, classification, and clustering. For time series data, Dynamic Time Warping (DTW) has been proven a very effective similarity measure, since it minimizes the effects of shifting and distortion in time. However, the quadratic cost of DTW computation to the length of the matched sequences makes its direct application on databases of long time series very expensive. We propose a technique that decomposes the sequences into a number of segments and uses cheap approximations thereof to compute fast lower bounds for their warping distances. We present several, progressively tighter bounds, relying on the existence or not of warping constraints. Finally, we develop an index and a multi-step technique that uses the proposed bounds and performs two levels of filtering to efficiently process similarity queries. A thorough experimental study suggests that our method consistently outperforms state-of-the-art methods for DTW similarity search. 相似文献

8.

Speeding up dynamic time warping distance for sparse time series data

Abdullah Mueen Nikan Chavoshi Noor Abu-El-Rub Hossein Hamooni Amanda Minnich Jonathan MacCarthy 《Knowledge and Information Systems》2018,54(1):237-263

Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining. 相似文献

9.

多维时序数据中的相似子序列搜索研究 总被引：4，自引：0，他引：4

程文聪邹鹏贾焰《计算机研究与发展》2010,47(3)

由于动态时间弯曲距离较之欧氏距离有更好鲁棒性,因此被广泛用作时序数据相似子序列搜索研究领域中的相似性度量.在单一维度上的相似子序列搜索可能不能获得足够的匹配结果作为继续深入分析的依据,因此通过引入在多维数据分析中常用的数据立方体模型将相似子序列搜索问题扩展到了多维场景之下,从而在多个维度上得到搜索结果以获取更多有价值的知识.在此基础上利用数据立方体相邻层次单元间的相关性对基本的搜索算法进行了改进,在保证准确性的基础上提高了搜索效率.在真实网络安全数据集上的实验验证了所提方法的有效性. 相似文献

10.

Empirical distribution of k-word matches in biological sequences

Sylvain Forêt Author Vitae Author Vitae Conrad J. Burden Author Vitae 《Pattern recognition》2009,42(4):539-12

This study focuses on an alignment-free sequence comparison method: the number of words of length k shared between two sequences, also known as the D₂ statistic. The advantages of the use of this statistic over alignment-based methods are firstly that it does not assume that homologous segments are contiguous, and secondly that the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequence under scrutiny. Existing applications of the D₂ statistic include the clustering of related sequences in large EST databases such as the STACK database. Such applications have typically relied on heuristics without any statistical basis. Rigorous statistical characterisations of the distribution of D₂ have subsequently been undertaken, but have focussed on the distribution's asymptotic behaviour, leaving the distribution of D₂ uncharacterised for most practical cases. The work presented here bridges these two worlds to give usable approximations of the distribution of D₂ for ranges of parameters most frequently encountered in the study of biological sequences. 相似文献

11.

Efficient and effective similarity search over probabilistic data based on Earth Mover’s Distance 总被引：1，自引：0，他引：1

Jia Xu Zhenjie Zhang Anthony K. H. Tung Ge Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(4):535-559

Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover’s Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B ⁺ trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management. 相似文献

12.

A comparison of time series similarity measures for classification and change detection of ecosystem dynamics 总被引：2，自引：0，他引：2

S. Lhermitte J. Verbesselt P. Coppin 《Remote sensing of environment》2011,115(12):3129-3152

Time series of remote sensing imagery or derived vegetation indices and biophysical products have been shown particularly useful to characterize land ecosystem dynamics. Various methods have been developed based on temporal trajectory analysis to characterize, classify and detect changes in ecosystem dynamics. Although time series similarity measures play an important role in these methods, a quantitative comparison of the similarity measures is lacking. The objective of this study was to provide an overview and quantitative comparison of the similarity measures in function of varying time series and ecosystem characteristics, such as amplitude, timing and noise effects. For this purpose, the performance was evaluated for the commonly used similarity measures (D), ranging from Manhattan (D_Man), Euclidean (D_E) and Mahalanobis (D_Mah) distance measures, to correlation (D_CC), Principal Component Analysis (PCA; D_PCA) and Fourier based (D_FFT,D_ξ,D_Fk) similarities. The quantitative comparison consists of a series of Monte-Carlo simulations based on subsets of global MODIS Normalized Difference Vegetation index (NDVI) and Enhanced Vegetation Index (EVI) and Leaf Area Index (LAI) data. Results of the simulations reveal four main groups of time series similarity measures with different sensitivities: (i) D_Man, D_E, D_PCA, D_Fk quantify the difference in time series values, (ii) D_Mah accounts for temporal correlation and non-stationarity of variance, (iii) D_CC measures the temporal correlation, and (iv) the Fourier based D_FFT and D_ξ show their specific sensitivity based on the selected Fourier components. The difference measures show relatively the highest sensitivity to amplitude effects, whereas the correlation based measures are highly sensitive to variations in timing and noise. The Fourier based measures, finally, depend highly on the signal to noise ratio and the balance between amplitude and phase dominance. The heterogeneity in sensitivity of each D stresses the importance of (i) understanding the time series characteristics before applying any classification of change detection approach and (ii) defining the variability one wants to identify/account for. This requires an understanding of the ecosystem dynamics and time series characteristics related to the baseline, amplitude, timing, noise and variability of the ecosystem time series. This is also illustrated in the quantitative comparison, where the different sensitivities of D for the NDVI, EVI, and LAI data relate specifically to the temporal characteristics of each data set. Additionally, the effect of noise and intra- and interclass variability is demonstrated in a case study based on land cover classification. 相似文献

13.

Fast similarity search and clustering of video sequences on the world-wide-web 总被引：3，自引：0，他引：3

Cheung S.-S. Zakhor A. 《Multimedia, IEEE Transactions on》2005,7(3):524-537

We define similar video content as video sequences with almost identical content but possibly compressed at different qualities, reformatted to different sizes and frame-rates, undergone minor editing in either spatial or temporal domain, or summarized into keyframe sequences. Building a search engine to identify such similar content in the World-Wide Web requires: 1) robust video similarity measurements; 2) fast similarity search techniques on large databases; and 3) intuitive organization of search results. In a previous paper, we proposed a randomized technique called the video signature (ViSig) method for video similarity measurement. In this paper, we focus on the remaining two issues by proposing a feature extraction scheme for fast similarity search, and a clustering algorithm for identification of similar clusters. Similar to many other content-based methods, the ViSig method uses high-dimensional feature vectors to represent video. To warrant a fast response time for similarity searches on high dimensional vectors, we propose a novel nonlinear feature extraction scheme on arbitrary metric spaces that combines the triangle inequality with the classical Principal Component Analysis (PCA). We show experimentally that the proposed technique outperforms PCA, Fastmap, Triangle-Inequality Pruning, and Haar wavelet on signature data. To further improve retrieval performance, and provide better organization of similarity search results, we introduce a new graph-theoretical clustering algorithm on large databases of signatures. This algorithm treats all signatures as an abstract threshold graph, where the distance threshold is determined based on local data statistics. Similar clusters are then identified as highly connected regions in the graph. By measuring the retrieval performance against a ground-truth set, we show that our proposed algorithm outperforms simple thresholding, single-link and complete-link hierarchical clustering techniques. 相似文献

14.

面向噪声数据的时间序列相似性搜索研究

曹丹阳孙宁马楠刘永彬《计算机工程与设计》2012,33(9):3442-3446

为了减少噪声数据对查询最优序列的影响,避免Euclidean距离对形态的敏感性,以及要求序列等长的缺点,提出了面向噪声数据的时间序列相似性搜索算法.运用SPC方法去除序列中的噪声数据;采用DTW距离作为度量函数,使用规范化方法使序列处于相同的分辨率下;采用LB_ Keogh下界函数对候选序列集合进行筛选.仿真实验结果表明,该算法在阈值较小时,对含有噪声数据序列的匹配能力较强. 相似文献

15.

Incremental indexing and retrieval mechanism for scalable and robust shape matching

Shehzad Khalid 《Multimedia Systems》2012,18(4):319-336

相似文献

16.

A time series representation model for accurate and fast similarity detection

Francesco Gullo Author Vitae Author Vitae Andrea Tagarelli^{Author Vitae} Sergio Greco Author Vitae 《Pattern recognition》2009,42(11):2998-3014

Similarity search and detection is a central problem in time series data processing and management. Most approaches to this problem have been developed around the notion of dynamic time warping, whereas several dimensionality reduction techniques have been proposed to improve the efficiency of similarity searches. Due to the continuous increasing of sources of time series data and the cruciality of real-world applications that use such data, we believe there is a challenging demand for supporting similarity detection in time series in a both accurate and fast way. Our proposal is to define a concise yet feature-rich representation of time series, on which the dynamic time warping can be applied for effective and efficient similarity detection of time series. We present the Derivative time series Segment Approximation (DSA) representation model, which originally features derivative estimation, segmentation and segment approximation to provide both high sensitivity in capturing the main trends of time series and data compression. We extensively compare DSA with state-of-the-art similarity methods and dimensionality reduction techniques in clustering and classification frameworks. Experimental evidence from effectiveness and efficiency tests on various datasets shows that DSA is well-suited to support both accurate and fast similarity detection. 相似文献

17.

基于Walsh变换的时序数据相似性搜索

下载免费PDF全文

崔保良滕少华崔振《计算机工程》2011,37(8):55-57

针对时序数据相似性搜索面临的高维性问题,提出一种利用按沃尔什序数排列的离散沃尔什变换((DWHT)w)对时序数据进行维归约的方法.(DWHT)w是正交变换,变换矩阵简单,可以应用快速算法,对时序数据有更好的特征提取能力,用其索引时间序列数据在理论上具备非漏报性质.与基于离散傅里叶变换和基于离散沃尔什变换的对比实验表明,... 相似文献

18.

基于优化的DTW算法的人体运动数据检索

刘贤梅赵丹郝爱民《模式识别与人工智能》2012,25(2):352-360

随着大量三维人体运动数据库的建立,使得在数据库中实现基于内容的三维人体运动检索面临着诸多困难,文中提出一种分阶段的动态时间变形(DTW)优化算法的人体运动数据检索技术,可有效检索出逻辑上相似的运动。该算法首先对齐两个运动序列的坐标位置,基于窗口距离构造距离矩阵。其次采用基于全局和局部约束的DTW优化算法进行相似度匹配,得到两个运动间的对应关系。最后通过归一化相似度和DTW平均距离分阶段判断运动的相似性。实验结果表明,分阶段的DTW优化算法在提高效率的同时对长度不等的运动能取得较好的检索结果。相似文献

19.

On the k-tuple domination of de Bruijn and Kautz digraphs

Toru Araki 《Information Processing Letters》2007,104(3):86-90

In a digraph G, a vertex u is said to dominate itself and vertices v such that (u,v) is an arc of G. For a positive integer k, a k-tuple dominating set D of a digraph is a subset of vertices such that every vertex is dominated by at least k vertices in D. The k-tuple domination number of a given digraph is the minimum cardinality of a k-tuple dominating set of the digraph. In this letter, we give the exact values of the k-tuple domination number of de Bruijn and Kautz digraphs. 相似文献

20.

Exact indexing for massive time series databases under time warping distance 总被引：1，自引：0，他引：1

Vit Niennattrakul Pongsakorn Ruengronghirunya Chotirat Ann Ratanamahatana 《Data mining and knowledge discovery》2010,21(3):509-541

Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed. 相似文献