期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Constrained Motif Discovery in Time Series

Yasser Mohammad Toyoaki Nishida 《New Generation Computing》2009,27(4):319-346

The goal of motif discovery algorithms is to efficiently find unknown recurring patterns. In this paper, we focus on motif discovery in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least super-linear time and space complexity. In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process. The paper then provides two algorithms called MCFull and MCInc for efficiently solving the constrained motif discovery problem. We also show that most unconstrained motif discovery problems be converted into constrained ones using a change-point detection algorithm. A novel change-point detection algorithm called the Robust Singular Spectrum Transform (RSST) is then introduced and compared to traditional Singular Spectrum Transform using synthetic and real-world data sets. The results show that RSST achieves higher specificity and is more adequate for finding constraints to convert unconstrained motif discovery problems to constrained ones that can be solved using MCFull and MCInc. We then compare the combination of RSST and MCFull or MCInc with two state-of-the-art motif discovery algorithms on a large set of synthetic time series. The results show that the proposed algorithms provided four to ten folds increase in speed compared the unconstrained motif discovery algorithms studied without any loss of accuracy. RSST+MCFull is then used in a real world human-robot interaction experiment to enable the robot to learn free hand gestures, actions, and their associations by watching humans and other robots interacting. 相似文献

2.

多变量时间序列例外模式的识别

翁小清沈钧毅《模式识别与人工智能》2007,20(3)

多变量时间序列(MTS)在金融、医学、科学、工程等领域是非常普遍的.本文提出一种在MTS中识别异常模式的方法.采用自底向上的分割算法将MTS分割成互不重叠的子序列,使用扩展的Frobenius范数来计算2个MTS子序列之间的相似性,通过K-均值聚类将MTS子序列分为若干个类.根据异常模式的定义,从这若干个类中识别出异常模式.在2个实际数据集上进行实验,实验结果验证算法的有效性. 相似文献

3.

Xiang Lian Lei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(11):1544-1558

Similarity join (SJ) in time-series databases has a wide spectrum of applications such as data cleaning and mining. Specifically, an SJ query retrieves all pairs of (sub)sequences from two time-series databases that epsiv-match with each other, where epsiv is the matching threshold. Previous work on this problem usually considers static time-series databases, where queries are performed either on disk-based multidimensional indexes built on static data or by nested loop join (NLJ) without indexes. SJ over multiple stream time series, which continuously outputs pairs of similar subsequences from stream time series, strongly requires low memory consumption, low processing cost, and query procedures that are themselves adaptive to time-varying stream data. These requirements invalidate the existing approaches in static databases. In this paper, we propose an efficient and effective approach to perform SJ among multiple stream time series incrementally. In particular, we present a novel method, Adaptive Radius-based Search (ARES), which can answer the similarity search without false dismissals and is seamlessly integrated into SJ processing. Most importantly, we provide a formal cost model for ARES, based on which ARES can be adaptive to data characteristics, achieving the minimum number of refined candidate pairs, and thus, suitable for stream processing. Furthermore, in light of the cost model, we utilize space-efficient synopses that are constructed for stream time series to further reduce the candidate set. Extensive experiments demonstrate the efficiency and effectiveness of our proposed approach. 相似文献

4.

Efficient Similarity Search over Future Stream Time Series 总被引：2，自引：0，他引：2

Xiang Lian Lei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(1):40-54

With the advance of hardware and communication technologies, stream time series is gaining ever-increasing attention due to its importance in many applications such as financial data processing, network monitoring, Web click-stream analysis, sensor data mining, and anomaly detection. For all of these applications, an efficient and effective similarity search over stream data is essential. Because of the unique characteristics of the stream, for example, data are frequently updated and real-time response is required, the previous approaches proposed for searching through archived data may not work in the stream scenarios. Especially, in the cases where data often arrive periodically for various reasons (for example, the communication congestion or batch processing), queries on such incomplete time series or even future time series may result in inaccuracy using traditional approaches. Therefore, in this paper, we propose three approaches, polynomial, Discrete Fourier Transform (DFT), and probabilistic, to predict the unknown values that have not arrived at the system and answer similarity queries based on the predicted data. We also apply efficient indexes, that is, a multidimensional hash index and a B⁺-tree, to facilitate the prediction and similarity search on future time series, respectively. Extensive experiments demonstrate the efficiency and effectiveness of our methods for prediction and answering queries. 相似文献

5.

高效时序相似搜索技术 总被引：6，自引：0，他引：6

冯玉才蒋涛李国徽朱虹《计算机学报》2009,32(11)

时序相似搜索被认为是将来最有前途的技术之一.然而,时序数据是典型的高维海量数据,如何开发高效算法非常关键.文中概述了时序相似搜索技术的研究现状和进展以及研究的主要内容,讨论了该技术的几个重要应用范例,并对一些典型算法进行了定量分析;然后晕点论述了高效时序相似搜索的关键技术,包括边界过滤、三角不等式修剪、多辨析率检索方法、过滤精炼方案等.最后讨论并分析了时序的近似相似搜索技术.上述所有技术通过对比,其正面和反面都被深入分析.最后指出了存在的问题和未来的研究热点和方向. 相似文献

6.

基于互关联后继树的多时间序列关联模式挖掘 总被引：3，自引：1，他引：3

曾海泉刘永丹宋扬胡运发《计算机研究与发展》2003,40(7):934-940

时间序列是现实生活中常见的数据形式之一,在时间序列中发现频繁模式是分析时间序列变化规律的一项重要任务．提出基于互关联后继树的多时间序列关联模式挖掘算法．该算法首先用Allen逻辑位置关系来描述序列状态关系,根据这些关系在时间窗口内顺序或并行出现情况,获得一个由这些关系组成的特殊序列．在此基础上提出了一个基于互关联后继树的新型挖掘模型,实现了序列间关联模式的挖掘．与其他方法相比,该算法简单、直观,而且整个挖掘过程不需要生成候选模式,大大提高挖掘效率．相似文献

7.

Fernando?Alonso Email author Juan?P.?Cara?a-Valente Lo?c?Martínez César?Montes 《Knowledge and Information Systems》2003,5(2):183-200

In this article, we describe the process of discovering similar patterns in time series and creating reference models for population groups in a medical domain, and particularly in the field of physiotherapy, using data mining techniques on a set of isokinetic data. The discovered knowledge was evaluated against the expertise of a physician specialized in isokinetic techniques, and applied in the I4 (Intelligent Interpretation of Isokinetic Information) project developed in conjunction with the Spanish National Center for Sports Research and Sciences for muscular diagnosis and rehabilitation, injury prevention, training evaluation and planning, etc., of elite athletes and ordinary people. 相似文献

8.

基于神经网络的时间序列相似模式发现方法

张鹏张建业杜军李学仁《模式识别与人工智能》2008,21(3)

基于无监督学习神经网络聚类原理,提出一种时间序列相似模式发现方法.通过快速离散余弦变换将序列映射到相应的特征模式空间,不但实现维数简约,而且克服传统神经网络不能处理过程序列的局限性.分析人工神经网络作为相似性度量模型的优越性,用"黑箱式"的网络权值代替传统的距离度量方法,并在此基础上实现相似模式的全部配对发现算法.对实际飞行数据仿真结果表明该方法的正确性,同时具有多尺度特性,可有效反映不同分辨率下序列间的相似程度. 相似文献

9.

RQIC:一种高效时序相似搜索算法

蒋涛冯玉才朱虹李国徽《计算机研究与发展》2009,46(5)

索引大规模时序数据库是高效时序搜索中的关键问题.提出了一种新颖的索引方案RQI, 它包括3种过滤策略: 即first-k过滤、索引低边界和上边界以及三角不等式修剪.基本的思想为首先运用Haar小波变换计算每个时序的小波系数,利用前面的k个小波系数形成一个最小边界矩阵,以利用点过滤方法;然后将预先计算每个时序的低边界特征和上边界特征存放到索引当中;最后采用三角不等式来修剪不相似的序列并确保没有漏报.同时提出了一种新的低边界距离函数SLBS和聚类算法CSA.通过CSA可保持索引良好的聚类特征以提高点过滤方法的效率,从而引入了一种更好的算法RQIC.在合成数据集和实时数据集的大量对比实验表明,RQIC是有效的且具备较高的查询效率. 相似文献

10.

Time Series Forecasting of Averaged Data With Efficient Use of Information

《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2005,35(5):738-745

Time series has been a popular tool for the analysis and forecasting of a large number of data. Very often, the applied approaches forecasts had limited success and the main reason was the lack of statistically significant historical information. We focus our attention on three common series, which are formed from the averaging of data collected over a shorter time interval. These include weekly and biweekly foreign exchange rates, mean hourly wind speed and electric load data. The proposed scheme, which takes advantage of the dominant characteristics of the shorter interval data, produced superior forecasts to those based on conventional approaches based only on historical observations of the target data. In the first two series, the proposed approach generated forecasts that significantly lower to those of the trivial random walk, a benchmark in series dominated by short-term correlation. On the load series, this approach made possible that a simple Auto-Regressive model returned lower forecasting error compared to a neural network that included special indicators to account for the periodic nature of the data. 相似文献

11.

A Dimensionality Reduction Technique for Efficient Time Series Similarity Analysis 总被引：1，自引：0，他引：1

Wang Q Megalooikonomou V 《Information Systems》2008,33(1):115-132

We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series with constant value segments, the proposed method--Piecewise Vector Quantized Approximation--uses the closest (based on a distance measure) codeword from a codebook of key-sequences to represent each segment. The new representation is symbolic and it allows for the application of text-based retrieval techniques into time series similarity analysis. Experiments on real and simulated datasets show that the proposed technique generally outperforms PCA techniques in clustering and similarity searches. 相似文献

12.

基于正则前馈神经网络的股票时间序列数据库的知识发现

王晓晔李冬梅王正欧《计算机工程》2003,29(12):98-100

将正则最小二乘前馈网络学习算法应用干时间序列的知识发现。正则最小二乘算法将正则化网络和节点删除算法结合起来,大大提高了前馈网络的泛化性能。将其应用于股票时间序列数据库的暂态规则的知识发现．发现过程包括时间序列数据库预处理和数据挖掘(规则发现)两部分。实验结果表明预测效果良好。相似文献

13.

面向热点话题时间序列的有效聚类算法研究 总被引：3，自引：0，他引：3

韩忠明陈妮乐嘉锦段大高孙践知《计算机学报》2012,35(11):2337-2347

聚类热度时间序列是揭示和建模网络热点话题形成与发展的重要过程.Leskovec等人在2010年提出面向话题时间序列的K_SC聚类算法,其精确度较高且能较好地刻画话题内在发展趋势特征.但K_SC算法具有对初始类矩阵中心高度敏感、高时间复杂度等特性,使其难以在实际高维大数据集上应用.文中结合小波变换技术,提出一个新的迭代式聚类算法WKSC,主要提出两个创新:(1)用Haar小波变换将原始时间序列进行压缩,降低原始时间序列的维度,从而降低了算法的时间复杂度;(2)在Haar反小波变换中,将低维聚类返回得到的矩阵中心作为高维聚类的初始矩阵中心,在迭代聚类过程中优化了对初始矩阵中心高敏感性的问题,提高了聚类的效果.文中分别采用国内外3个数据集作为测试样本,进行了大量的实验.实验结果表明WKSC算法能显著降低聚类的时间复杂度,同时改进聚类效果.WKSC算法可很好的应用于大量高维热点话题的模式分析. 相似文献

14.

基于形态特征的时间序列相似性搜索算法

毛云建杜秀华《计算机仿真》2008,25(1):80-83

时间序列相似性搜索是数据挖掘的一个重要基础性研究内容,它的相似性定义主要是基于欧氏距离,这类算法的缺点:如果时间序列产生偏移,会产生错误的结果.基于形态特征的时间序列相似性快速搜索算法,以界标为分界点,利用界标提取了时间序列的特征,将时序分为若干子序列,并对每个子序列进行线性化,将线性化后的子序列进行预处理;同样将查询序列进行基于界标的分段算法,然后利用一种改进的快速相似性搜索算法,可以快速地搜索到与查询序列相似的序列.?＃箅例表明了算法的有效性. 相似文献

15.

基于时序数据库的转移规则挖掘算法研究

黄建设《计算机仿真》2008,25(6)

转移规则挖掘算法的提出对于关联挖掘算法等原有数据挖掘算法做了重要补充.然而,目前的转移规则挖掘算法由于选取挖掘对象的不当,往往使转移规则缺乏代表性,导致规则无参考价值.在分析原有转移规则挖掘方法不足的基础上,提出了两种改进的方法:基于关联挖掘的转移规则发现和基于概率关系数据模式的转移规则挖掘,并把这两种方法和现有的转移规则挖掘算法融合到一起,构造一个更为有效和可行的新的基于时序数据库的转移规则挖掘算法. 相似文献

16.

Discovery of Periodic Patterns in Spatiotemporal Sequences 总被引：1，自引：0，他引：1

Huiping Cao Mamoulis N. Cheung D.W. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):453-467

In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data could unveil important information to the data analyst. Existing approaches for discovering periodic patterns focus on symbol sequences. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining periodic patterns in spatiotemporal data and propose an effective and efficient algorithm for retrieving maximal periodic patterns. In addition, we study two interesting variants of the problem. The first is the retrieval of periodic patterns that are frequent only during a continuous subinterval of the whole history. The second problem is the discovery of periodic patterns, whose instances may be shifted or distorted. We demonstrate how our mining technique can be adapted for these variants. Finally, we present a comprehensive experimental evaluation, where we show the effectiveness and efficiency of the proposed techniques 相似文献

17.

Knowledge Discovery from Series of Interval Events 总被引：4，自引：0，他引：4

Roy Villafane Kien A. Hua Duc Tran Basab Maulik 《Journal of Intelligent Information Systems》2000,15(1):71-89

Knowledge discovery from data sets can be extensively automated by using data mining software tools. Techniques for mining series of interval events, however, have not been considered. Such time series are common in many applications. In this paper, we propose mining techniques to discover temporal containment relationships in such series. Specifically, an item A is said to contain an item B if an event of type B occurs during the time span of an event of type A, and this is a frequent relationship in the data set. Mining such relationships provides insight about temporal relationships among various items. We implement the technique and analyze trace data collected from a real database application. Experimental results indicate that the proposed mining technique can discover interesting results. We also introduce a quantization technique as a preprocessing step to generalize the method to all time series. 相似文献

18.

Nearest-Neighbours for Time Series

Juan Manuel Gimeno Illa Javier Béjar Alonso Miquel Sànchez Marré 《Applied Intelligence》2004,20(1):21-35

This paper presents an application of lazy learning algorithms in the domain of industrial processes. These processes are described by a set of variables, each corresponding a time series. Each variable plays a different role in the process and some mutual influences can be discovered.A methodology to study the different variables and their roles in the process are described. This methodology allows the structuration of the study of the time series.The prediction methodology is based on a k-nearest neighbour algorithm. A complete study of the different parameters of this kind of algorithm is done, including data preprocessing, neighbour distance, and weighting strategies. An alternative to Euclidean distance called shape distance is presented, this distance is insensitive to scaling and translation. Alternative weighting strategies based on time series autocorrelation and partial autocorrelation are also presented.Experiments using autorregresive models, simulated data and real data obtained from an industrial process (Waste water treatment plants) are presented to show the feasabilty of our approach. 相似文献

19.

一种基于时间衰减模型的数据流闭合模式挖掘方法 总被引：1，自引：0，他引：1

韩萌王志海原继东《计算机学报》2015,(7)

数据流是随着时间顺序快速变化的和连续的,对其进行频繁模式挖掘时会出现概念漂移现象。在一些数据流应用中,通常认为最新的数据具有最大的价值。数据流挖掘会产生大量无用的模式,为了减少无用模式且保证无损压缩,需要挖掘闭合模式。因此,提出了一种基于时间衰减模型和闭合算子的数据流闭合模式挖掘方式TDMCS （Time-Decay-Model-based Closed frequent pattern mining on data Stream）。该算法采用时间衰减模型来区分滑动窗口内的历史和新近事务权重,使用闭合算子提高闭合模式挖掘的效率,设计使用最小支持度-最大误差率-衰减因子的三层架构避免概念漂移,设计一种均值衰减因子平衡算法的高查全率和高查准率。实验分析表明该算法适用于挖掘高密度、长模式的数据流;且具有较高的效率,在不同大小的滑动窗口条件下性能表现是稳态的,同时也优于其他同类算法。相似文献

20.

基于成熟节点的分布式电子商务服务发现模型

杨哲《计算机工程与应用》2006,42(3):9-11,56

文章提出的SophiN ode模型,在现有P2P服务发现技术的基础上,模拟人类的交流过程,让每个节点都具有“经验”和“直觉”,在分布式电子商务环境中,利用成熟节点快速、正确地发现商务服务。节点通过自身的服务信息库及对每个邻居节点的成熟度指标进行评价,从而选择合适的成熟节点。实验结果表明该模型行之有效,能够较大地提高服务发现的效率,同时较好地解决了扩展性、复杂性等问题。相似文献