共查询到20条相似文献,搜索用时 0 毫秒
1.
The goal of motif discovery algorithms is to efficiently find unknown recurring patterns. In this paper, we focus on motif
discovery in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic
or at least super-linear time and space complexity. In this paper we define the Constrained Motif Discovery problem which
enables utilization of domain knowledge into the motif discovery process. The paper then provides two algorithms called MCFull
and MCInc for efficiently solving the constrained motif discovery problem. We also show that most unconstrained motif discovery
problems be converted into constrained ones using a change-point detection algorithm. A novel change-point detection algorithm
called the Robust Singular Spectrum Transform (RSST) is then introduced and compared to traditional Singular Spectrum Transform
using synthetic and real-world data sets. The results show that RSST achieves higher specificity and is more adequate for
finding constraints to convert unconstrained motif discovery problems to constrained ones that can be solved using MCFull
and MCInc. We then compare the combination of RSST and MCFull or MCInc with two state-of-the-art motif discovery algorithms
on a large set of synthetic time series. The results show that the proposed algorithms provided four to ten folds increase
in speed compared the unconstrained motif discovery algorithms studied without any loss of accuracy. RSST+MCFull is then used
in a real world human-robot interaction experiment to enable the robot to learn free hand gestures, actions, and their associations
by watching humans and other robots interacting. 相似文献
2.
多变量时间序列(MTS)在金融、医学、科学、工程等领域是非常普遍的.本文提出一种在MTS中识别异常模式的方法.采用自底向上的分割算法将MTS分割成互不重叠的子序列,使用扩展的Frobenius范数来计算2个MTS子序列之间的相似性,通过K-均值聚类将MTS子序列分为若干个类.根据异常模式的定义,从这若干个类中识别出异常模式.在2个实际数据集上进行实验,实验结果验证算法的有效性. 相似文献
3.
时间序列的快速相似性搜索改进算法 总被引:1,自引:0,他引:1
This paper introduces a new method for finding all subsequences similar to a given time series sequence.The method takes into account noise ,offset translation and amplitude scaling. Based on a piecewise linear representa-tion, the speed is exceptionally fast. 相似文献
4.
Xiang Lian Lei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(11):1544-1558
Similarity join (SJ) in time-series databases has a wide spectrum of applications such as data cleaning and mining. Specifically, an SJ query retrieves all pairs of (sub)sequences from two time-series databases that epsiv-match with each other, where epsiv is the matching threshold. Previous work on this problem usually considers static time-series databases, where queries are performed either on disk-based multidimensional indexes built on static data or by nested loop join (NLJ) without indexes. SJ over multiple stream time series, which continuously outputs pairs of similar subsequences from stream time series, strongly requires low memory consumption, low processing cost, and query procedures that are themselves adaptive to time-varying stream data. These requirements invalidate the existing approaches in static databases. In this paper, we propose an efficient and effective approach to perform SJ among multiple stream time series incrementally. In particular, we present a novel method, Adaptive Radius-based Search (ARES), which can answer the similarity search without false dismissals and is seamlessly integrated into SJ processing. Most importantly, we provide a formal cost model for ARES, based on which ARES can be adaptive to data characteristics, achieving the minimum number of refined candidate pairs, and thus, suitable for stream processing. Furthermore, in light of the cost model, we utilize space-efficient synopses that are constructed for stream time series to further reduce the candidate set. Extensive experiments demonstrate the efficiency and effectiveness of our proposed approach. 相似文献
5.
Efficient Similarity Search over Future Stream Time Series 总被引:2,自引:0,他引:2
With the advance of hardware and communication technologies, stream time series is gaining ever-increasing attention due to its importance in many applications such as financial data processing, network monitoring, Web click-stream analysis, sensor data mining, and anomaly detection. For all of these applications, an efficient and effective similarity search over stream data is essential. Because of the unique characteristics of the stream, for example, data are frequently updated and real-time response is required, the previous approaches proposed for searching through archived data may not work in the stream scenarios. Especially, in the cases where data often arrive periodically for various reasons (for example, the communication congestion or batch processing), queries on such incomplete time series or even future time series may result in inaccuracy using traditional approaches. Therefore, in this paper, we propose three approaches, polynomial, Discrete Fourier Transform (DFT), and probabilistic, to predict the unknown values that have not arrived at the system and answer similarity queries based on the predicted data. We also apply efficient indexes, that is, a multidimensional hash index and a B+-tree, to facilitate the prediction and similarity search on future time series, respectively. Extensive experiments demonstrate the efficiency and effectiveness of our methods for prediction and answering queries. 相似文献
6.
高效时序相似搜索技术 总被引:6,自引:0,他引:6
时序相似搜索被认为是将来最有前途的技术之一.然而,时序数据是典型的高维海量数据,如何开发高效算法非常关键.文中概述了时序相似搜索技术的研究现状和进展以及研究的主要内容,讨论了该技术的几个重要应用范例,并对一些典型算法进行了定量分析;然后晕点论述了高效时序相似搜索的关键技术,包括边界过滤、三角不等式修剪、多辨析率检索方法、过滤精炼方案等.最后讨论并分析了时序的近似相似搜索技术.上述所有技术通过对比,其正面和反面都被深入分析.最后指出了存在的问题和未来的研究热点和方向. 相似文献
7.
基于互关联后继树的多时间序列关联模式挖掘 总被引:4,自引:1,他引:3
时间序列是现实生活中常见的数据形式之一,在时间序列中发现频繁模式是分析时间序列变化规律的一项重要任务.提出基于互关联后继树的多时间序列关联模式挖掘算法.该算法首先用Allen逻辑位置关系来描述序列状态关系,根据这些关系在时间窗口内顺序或并行出现情况,获得一个由这些关系组成的特殊序列.在此基础上提出了一个基于互关联后继树的新型挖掘模型,实现了序列间关联模式的挖掘.与其他方法相比,该算法简单、直观,而且整个挖掘过程不需要生成候选模式,大大提高挖掘效率. 相似文献
8.
基于无监督学习神经网络聚类原理,提出一种时间序列相似模式发现方法.通过快速离散余弦变换将序列映射到相应的特征模式空间,不但实现维数简约,而且克服传统神经网络不能处理过程序列的局限性.分析人工神经网络作为相似性度量模型的优越性,用"黑箱式"的网络权值代替传统的距离度量方法,并在此基础上实现相似模式的全部配对发现算法.对实际飞行数据仿真结果表明该方法的正确性,同时具有多尺度特性,可有效反映不同分辨率下序列间的相似程度. 相似文献
9.
In this article, we describe the process of discovering similar patterns in time series
and creating reference models for population groups in a medical domain, and particularly
in the field of physiotherapy, using data mining techniques on a set of isokinetic data. The
discovered knowledge was evaluated against the expertise of a physician specialized
in isokinetic techniques, and applied in the I4 (Intelligent Interpretation of Isokinetic
Information) project developed in conjunction with the Spanish National Center for
Sports Research and Sciences for muscular diagnosis and rehabilitation, injury prevention,
training evaluation and planning, etc., of elite athletes and ordinary people. 相似文献
10.
We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series with constant value segments, the proposed method--Piecewise Vector Quantized Approximation--uses the closest (based on a distance measure) codeword from a codebook of key-sequences to represent each segment. The new representation is symbolic and it allows for the application of text-based retrieval techniques into time series similarity analysis. Experiments on real and simulated datasets show that the proposed technique generally outperforms PCA techniques in clustering and similarity searches. 相似文献
11.
索引大规模时序数据库是高效时序搜索中的关键问题.提出了一种新颖的索引方案RQI, 它包括3种过滤策略: 即first-k过滤、索引低边界和上边界以及三角不等式修剪.基本的思想为首先运用Haar小波变换计算每个时序的小波系数,利用前面的k个小波系数形成一个最小边界矩阵,以利用点过滤方法;然后将预先计算每个时序的低边界特征和上边界特征存放到索引当中;最后采用三角不等式来修剪不相似的序列并确保没有漏报.同时提出了一种新的低边界距离函数SLBS和聚类算法CSA.通过CSA可保持索引良好的聚类特征以提高点过滤方法的效率,从而引入了一种更好的算法RQIC.在合成数据集和实时数据集的大量对比实验表明,RQIC是有效的且具备较高的查询效率. 相似文献
12.
《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2005,35(5):738-745
Time series has been a popular tool for the analysis and forecasting of a large number of data. Very often, the applied approaches forecasts had limited success and the main reason was the lack of statistically significant historical information. We focus our attention on three common series, which are formed from the averaging of data collected over a shorter time interval. These include weekly and biweekly foreign exchange rates, mean hourly wind speed and electric load data. The proposed scheme, which takes advantage of the dominant characteristics of the shorter interval data, produced superior forecasts to those based on conventional approaches based only on historical observations of the target data. In the first two series, the proposed approach generated forecasts that significantly lower to those of the trivial random walk, a benchmark in series dominated by short-term correlation. On the load series, this approach made possible that a simple Auto-Regressive model returned lower forecasting error compared to a neural network that included special indicators to account for the periodic nature of the data. 相似文献
13.
14.
多元时间序列中跨事务关联规则分析的高效处理算法 总被引:5,自引:1,他引:5
用挖掘跨事务关联规则的方法分析多元时间序列,可以找到序列中不同采样点观察值之间相互影响的关系。本文为实现这一目的,提出一种新的分析方法:ES—Apriori。此方法通过减少数据库扫描次数,优化内存分配,能够高效地分析多元时间序列之间的关联规则。试验表明,用此方法分析中国证券市场的股票时间序列非常有效。 相似文献
15.
面向热点话题时间序列的有效聚类算法研究 总被引:3,自引:0,他引:3
聚类热度时间序列是揭示和建模网络热点话题形成与发展的重要过程.Leskovec等人在2010年提出面向话题时间序列的K_SC聚类算法,其精确度较高且能较好地刻画话题内在发展趋势特征.但K_SC算法具有对初始类矩阵中心高度敏感、高时间复杂度等特性,使其难以在实际高维大数据集上应用.文中结合小波变换技术,提出一个新的迭代式聚类算法WKSC,主要提出两个创新:(1)用Haar小波变换将原始时间序列进行压缩,降低原始时间序列的维度,从而降低了算法的时间复杂度;(2)在Haar反小波变换中,将低维聚类返回得到的矩阵中心作为高维聚类的初始矩阵中心,在迭代聚类过程中优化了对初始矩阵中心高敏感性的问题,提高了聚类的效果.文中分别采用国内外3个数据集作为测试样本,进行了大量的实验.实验结果表明WKSC算法能显著降低聚类的时间复杂度,同时改进聚类效果.WKSC算法可很好的应用于大量高维热点话题的模式分析. 相似文献
16.
时间序列相似性搜索是数据挖掘的一个重要基础性研究内容,它的相似性定义主要是基于欧氏距离,这类算法的缺点:如果时间序列产生偏移,会产生错误的结果.基于形态特征的时间序列相似性快速搜索算法,以界标为分界点,利用界标提取了时间序列的特征,将时序分为若干子序列,并对每个子序列进行线性化,将线性化后的子序列进行预处理;同样将查询序列进行基于界标的分段算法,然后利用一种改进的快速相似性搜索算法,可以快速地搜索到与查询序列相似的序列.?# 箅例表明了算法的有效性. 相似文献
17.
对临床检验指标时间序列进行聚类,从中发现临床检验指标变化趋势相似的患者群体,对开展精准医疗具有非常重要的价值。考虑到不同患者的检验次数及检验时间点不完全同步,首先通过对非同步时间序列进行预处理,实现不同时间序列维度及时间点的同步化。在此基础上,通过引入一个用户自定义参数即噪声点占有率NoisePro,对DBScan算法进行改进,提出了一种基于密度划分思想的非同步临床检验指标时间序列聚类LabTS-CLU算法。最后利用某三甲医院十余万糖尿病患者近10年的糖化血红蛋白时间序列数据集进行实验,结果证明了所提算法的有效性。 相似文献
18.
转移规则挖掘算法的提出对于关联挖掘算法等原有数据挖掘算法做了重要补充.然而,目前的转移规则挖掘算法由于选取挖掘对象的不当,往往使转移规则缺乏代表性,导致规则无参考价值.在分析原有转移规则挖掘方法不足的基础上,提出了两种改进的方法:基于关联挖掘的转移规则发现和基于概率关系数据模式的转移规则挖掘,并把这两种方法和现有的转移规则挖掘算法融合到一起,构造一个更为有效和可行的新的基于时序数据库的转移规则挖掘算法. 相似文献
19.
Discovery of Periodic Patterns in Spatiotemporal Sequences 总被引:1,自引:0,他引:1
Huiping Cao Mamoulis N. Cheung D.W. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):453-467
In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data could unveil important information to the data analyst. Existing approaches for discovering periodic patterns focus on symbol sequences. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining periodic patterns in spatiotemporal data and propose an effective and efficient algorithm for retrieving maximal periodic patterns. In addition, we study two interesting variants of the problem. The first is the retrieval of periodic patterns that are frequent only during a continuous subinterval of the whole history. The second problem is the discovery of periodic patterns, whose instances may be shifted or distorted. We demonstrate how our mining technique can be adapted for these variants. Finally, we present a comprehensive experimental evaluation, where we show the effectiveness and efficiency of the proposed techniques 相似文献
20.
Knowledge Discovery from Series of Interval Events 总被引:4,自引:0,他引:4
Roy Villafane Kien A. Hua Duc Tran Basab Maulik 《Journal of Intelligent Information Systems》2000,15(1):71-89
Knowledge discovery from data sets can be extensively automated by using data mining software tools. Techniques for mining series of interval events, however, have not been considered. Such time series are common in many applications. In this paper, we propose mining techniques to discover temporal containment relationships in such series. Specifically, an item A is said to contain an item B if an event of type B occurs during the time span of an event of type A, and this is a frequent relationship in the data set. Mining such relationships provides insight about temporal relationships among various items. We implement the technique and analyze trace data collected from a real database application. Experimental results indicate that the proposed mining technique can discover interesting results. We also introduce a quantization technique as a preprocessing step to generalize the method to all time series. 相似文献