首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为了更好地体现时间序列的形态特征,并探索更适合于较长时间序列之间相似性度量的方法,在动态时间弯曲算法的基础上进行改进,提出了基于分层动态时间弯曲的序列相似性度量方法。对时间序列进行多层次分段,并从分段中均匀抽取相对应的层次分段子序列,然后将层次分段子序列抽象为三维空间的点(反映了分段子序列的均值、长度和趋势)进行相似性度量,最后综合各个层次的相似性度量作为结果。实验表明,在参数设置合理的情况下,此方法能获得较高的序列相似性度量准确度和效率。  相似文献   

2.
时间序列数据挖掘中的动态时间弯曲研究综述   总被引:1,自引:1,他引:0  
李海林  梁叶  王少春 《控制与决策》2018,33(8):1345-1353
动态时间弯曲是一种重要的相似性度量方法,对时间序列数据挖掘的性能起着至为关键的作用,对其进行全面和深入的探索具有十分重要的理论意义和实际应用价值.首先简述动态时间弯曲算法的基本步骤,并分析其优点和存在的不足;然后,从动态时间弯曲度量效率的改进研究、度量效果的提升措施以及其在各个行业的应用研究等进行相关综述;最后,给出动态时间弯曲的进一步研究方向.通过对动态时间弯曲方法相关综述及分析,能为相似性度量、聚类和分类等时间序列数据挖掘技术提供必要的文献资料和理论基础.  相似文献   

3.
Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining.  相似文献   

4.
Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penalty-based DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems.  相似文献   

5.
In recent years, dynamic time warping (DTW) has begun to become the most widely used technique for comparison of time series data where extensive a priori knowledge is not available. However, it is often expected a multivariate comparison method to consider the correlation between the variables as this correlation carries the real information in many cases. Thus, principal component analysis (PCA) based similarity measures, such as PCA similarity factor (SPCA), are used in many industrial applications.In this paper, we present a novel algorithm called correlation based dynamic time warping (CBDTW) which combines DTW and PCA based similarity measures. To preserve correlation, multivariate time series are segmented and the local dissimilarity function of DTW originated from SPCA. The segments are obtained by bottom-up segmentation using special, PCA related costs. Our novel technique qualified on two databases, the database of signature verification competition 2004 and the commonly used AUSLAN dataset. We show that CBDTW outperforms the standard SPCA and the most commonly used, Euclidean distance based multivariate DTW in case of datasets with complex correlation structure.  相似文献   

6.
传统的聚类算法多是针对某个时间片上的静态数据集合进行的聚类分析,但事实上大部分数据存在时间序列上的连续动态演变过程.本文对时间序列数据及其类结构的演变过程进行了分析,发现在一定条件下相邻时间片间的数据集间存在较强的关联性,并且类簇结构间则存在一定的继承性.故本文得出新的思想,在前一时间片聚类结果的基础上,通过对部分变化数据的计算和类簇结构的局部调整就有望获得对后一时间片上数据进行完全聚类相同的效果,且运算量会显著下降.基于此思想提出了一种时间序列数据的动态密度聚类算法(DDCA/TSD).仿真实验中使用6种数据集对所提出算法进行了实验验证.结果显示DDCA/TSD在保证聚类准确性的基础上相对传统聚类算法有明显的时间效率提升,并能更有效地发现数据点的属性变化及类簇结构的演变过程.  相似文献   

7.
We propose a new method to calculate the similarity of time series based on piecewise linear approximation (PLA) and derivative dynamic time warping (DDTW). The proposed method includes two phases. One is the divisive approach of piecewise linear approximation based on the middle curve of original time series. Apart from the attractive results, it can create line segments to approximate time series faster than conventional linear approximation. Meanwhile, high dimensional space can be reduced into a lower one and the line segments approximating the time series are used to calculate the similarity. In the other phase, we utilize the main idea of DDTW to provide another similarity measure based on the line segments just we got from the first phase. We empirically compare our new approach to other techniques and demonstrate its superiority.  相似文献   

8.
9.
Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification.  相似文献   

10.
针对多批次多工况化工过程,离线模型易老化失效和不易满足工业生产的实时优化控制问题,提出一种基于仿射传播聚类和动态时间弯曲距离的LS-SVM在线建模方法。该方法首先利用仿射传播聚类算法对各批次样本进行工况划分,再考虑样本间的时间有序性,由包含待测样本的一段时间序列作为查询序列,并以动态时间弯曲距离来衡量序列间的相似情况,从各历史批次相应的工况阶段获取相似样本片段,构建训练样本集,最后采用最小二乘支持向量机建立在线预测模型。将该方法用于青霉素浓度预测中,仿真研究表明,所提方法提高了建模预测精度和泛化能力。  相似文献   

11.
为了实现Web服务请求数据的快速聚类,并提高聚类的准确率,提出一种基于增量式时间序列和任务调度的Web数据聚类算法,该算法进行了Web数据在时间序列上的聚类定义,并采用增量式时间序列聚类方法,通过数据压缩的形式降低Web数据的复杂性,进行基于服务时间相似性的时间序列数据聚类。针对Web集群服务的最佳服务任务调度问题,通过以服务器执行能力为标准来分配服务任务。实验仿真结果表明,相比基于网格的高维数据层次聚类算法和基于增量学习的多目标模糊聚类算法,提出的算法在聚类时间、聚类精度、服务执行成功率上均获得了更好的效果。  相似文献   

12.
13.
Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and knn queries using real-life dynamic data sets of substantial size. Edited by W. Aref  相似文献   

14.
Exact indexing of dynamic time warping   总被引:16,自引:1,他引:16  
The problem of indexing time series has attracted much interest. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry and finance. Unfortunately, however, DTW does not obey the triangular inequality and thus has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques or abandoned the idea of indexing and concentrated on speeding up sequential searches. In this work, we introduce a novel technique for the exact indexing of DTW. We prove that our method guarantees no false dismissals and we demonstrate its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.  相似文献   

15.
Scaling and time warping in time series querying   总被引:3,自引:0,他引:3  
The last few years have seen an increasing understanding that dynamic time warping (DTW), a technique that allows local flexibility in aligning time series, is superior to the ubiquitous Euclidean distance for time series classification, clustering, and indexing. More recently, it has been shown that for some problems, uniform scaling (US), a technique that allows global scaling of time series, may just be as important for some problems. In this work, we note that for many real world problems, it is necessary to combine both DTW and US to achieve meaningful results. This is particularly true in domains where we must account for the natural variability of human actions, including biometrics, query by humming, motion-capture/animation, and handwriting recognition. We introduce the first technique which can handle both DTW and US simultaneously, our techniques involve search pruning by means of a lower bounding technique and multi-dimensional indexing to speed up the search. We demonstrate the utility and effectiveness of our method on a wide range of problems in industry, medicine, and entertainment.  相似文献   

16.
DNA微阵列技术的应用产生了大量的基因表达时序数据,对这些数据进行聚类是获取其中隐含的生物分子信息的一种重要方法。提出了一种基于隐马尔可夫模型(HMM)的层次聚类方法,根据基因表达时序数据的统计特性对其进行标准化和离散化等预处理,用HMM对经过预处理的数据建模以利用基因表达时序数据不同时间点之间的相关性,用层次聚类方法对建立的模型进行聚类。实验结果表明该方法不仅能够产生好的聚类,而且能够确定最优的聚类数。  相似文献   

17.
A new data mining technique used to classify normal and pre-seizure electroencephalograms is proposed. The technique is based on a dynamic time warping kernel combined with support vector machines (SVMs). The experimental results show that the technique is superior to the standard SVM and improves the brain activity classification. This research was partially supported by Rutgers Research Council grant 202018, NSF grants CCF-0546574, DBI-980821, EIA-9872509, and CCF 0546574, and NIH grant R01-NS-39687-01A1. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 159–173, January–February 2008.  相似文献   

18.
常炳国  臧虹颖 《计算机应用》2018,38(7):1910-1915
针对传统的动态时间弯曲(DTW)度量方法易出现过度弯曲现象且计算复杂度高、算法效率低等问题,提出一种基于路径修正的动态时间弯曲(UDTW)度量方法。首先通过分段降维方法——分段局部最大值平滑法(PLM)有效提取序列特征信息,减少UDTW的计算代价;其次,考虑了时间序列形态特征的相似性要求,给过度弯曲路径设置动态惩罚系数,以此修正路径的弯曲程度;最后,在改进度量距离基础上,采用1-近邻分类算法对时序数据进行分类,以提高时间序列相似性度量的准确率和效率。实验结果表明,在15个UCR数据集上,UDTW度量方法与传统DTW度量方法相比具有更高的分类准确率,UDTW在其中3个数据集上能实现100%分类正确;与导数DTW(DDTW)度量方法相比,UDTW分类准确率最多提高了71.8%,而PLM-UDTW在不影响分类准确率的前提下执行时间减小了99%。  相似文献   

19.
20.
时序数据中的野值会直接影响数据挖掘算法的结果,甚至造成算法失效。传统的基于密度的带有噪声的空间聚类(DBSCAN)算法可以用来识别野值,但是却存在算法对参数敏感、时间复杂度高、精度不高等问题。针对时序数据的特点,提出了一种可自动进行多次识别的基于方差聚类的野值识别算法。该方法通过将传统的邻域密度转换为方差和均值、将密度阈值转换为时间窗口内的方差和阈值,在定义野值数据、野簇数据和异常簇数据的基础上,给出野值识别方法的判断规则。同时,针对一次野值识别不能将全部野值剔除的问题,通过定义多次野值识别的结束条件将算法扩展为多次野值识别算法。通过在某航天数据挖掘项目中的应用,验证了该算法具有较好的通用性、低的时间复杂度、可进行多次识别以提高精度等特点。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号