共查询到20条相似文献,搜索用时 0 毫秒
1.
针对时间序列距离度量的算法很多,但没有适用于不规则时序距离度量算法的现状,基于寻求全局序列点构成的边集之间的距离路径最小的思想,提出一种不规则时序距离度量的算法,并给出了事件序列生成算法和不规则时序距离度量算法的实现,最后利用UCI KDD的时间序列测试数据对算法进行了测试。测试结果证明了该不规则时序距离算法能够有效度量不规则时序的相似性。 相似文献
2.
Clustering heteroskedastic time series by model-based procedures 总被引:1,自引:0,他引:1
Edoardo Otranto 《Computational statistics & data analysis》2008,52(10):4685-4698
Financial time series are often characterized by similar volatility structures. The detection of clusters of series displaying similar behavior could be important in understanding the differences in the estimated processes, without having to study and compare the estimated parameters across all the series. This is particularly relevant when dealing with many series, as in financial applications. The volatility of a time series can be characterized in terms of the underlying GARCH process. Using Wald tests and the Autoregressive metrics to measure the distance between GARCH processes, it is shown that it is possible to develop a clustering algorithm, which can provide three classifications (with increasing degree of deepness) based on the heteroskedastic patterns of the time series. The number of clusters is detected automatically and it is not fixed a priori or a posteriori. The procedure is evaluated by simulations and applied to the sector indices of the Italian market. 相似文献
3.
仿射传播算法是一种快速有效的聚类方法,但其聚类结果的不稳定性影响了聚类性能。对此,提出基于近邻的仿射传播算法(AP-NN),通过仿射传播算法产生初始簇,并从中选择代表簇对非代表簇的样本进行近邻聚类。在时间序列数据集上的实验结果表明,AP-NN模型算法能够产生较好的聚类结果,适用于聚类分析。 相似文献
4.
基于Fisher比的梅尔倒谱系数混合特征提取方法 总被引:1,自引:0,他引:1
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。 相似文献
5.
Rohit J. Kate 《Data mining and knowledge discovery》2016,30(2):283-312
Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification. 相似文献
6.
T. Warren Liao Author Vitae 《Pattern recognition》2005,38(11):1857-1874
Time series clustering has been shown effective in providing useful information in various domains. There seems to be an increased interest in time series clustering as part of the effort in temporal data mining research. To provide an overview, this paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains. The basics of time series clustering are presented, including general-purpose clustering algorithms commonly used in time series clustering studies, the criteria for evaluating the performance of the clustering results, and the measures to determine the similarity/dissimilarity between two time series being compared, either in the forms of raw data, extracted features, or some model parameters. The past researchs are organized into three groups depending upon whether they work directly with the raw data either in the time or frequency domain, indirectly with features extracted from the raw data, or indirectly with models built from the raw data. The uniqueness and limitation of previous research are discussed and several possible topics for future research are identified. Moreover, the areas that time series clustering have been applied to are also summarized, including the sources of data used. It is hoped that this review will serve as the steppingstone for those interested in advancing this area of research. 相似文献
7.
针对基于u-shapelets的时间序列聚类中u-shapelets集合质量较低的问题,提出一种基于最佳u-shapelets的时间序列聚类算法DivUshapCluster。首先,探讨不同子序列质量评估方法对基于u-shapelets的时间序列聚类结果的影响;然后,选用最佳的子序列质量评估方法对u-shapelet候选集进行质量评估;其次,引入多元top-k查询技术对u-shapelet候选集进行去除冗余操作,搜索出最佳的u-shapelets集合;最后,利用最佳u-shapelets集合对原始数据集进行转化,达到提高时间序列聚类准确率的目的。实验结果表明,DivUshapCluster算法在聚类准确度上不仅优于经典的时间序列聚类算法,而且与BruteForce算法和SUSh算法相比,DivUshapCluster算法在22个数据集上的平均聚类准确度分别提高了18.80%和19.38%。所提算法能够在保证整体效率的情况下有效提高时间序列的聚类准确度。 相似文献
8.
Optimal representation of acoustic features is an ongoing challenge in automatic speech recognition research. As an initial step toward this purpose, optimization of filterbanks for the cepstral coefficient using evolutionary optimization methods is proposed in some approaches. However, the large number of optimization parameters required by a filterbank makes it difficult to guarantee that an individual optimized filterbank can provide the best representation for phoneme classification. Moreover, in many cases, a number of potential solutions are obtained. Each solution presents discrimination between specific groups of phonemes. In other words, each filterbank has its own particular advantage. Therefore, the aggregation of the discriminative information provided by filterbanks is demanding challenging task. In this study, the optimization of a number of complementary filterbanks is considered to provide a different representation of speech signals for phoneme classification using the hidden Markov model (HMM). Fuzzy information fusion is used to aggregate the decisions provided by HMMs. Fuzzy theory can effectively handle the uncertainties of classifiers trained with different representations of speech data. In this study, the output of the HMM classifiers of each expert is fused using a fuzzy decision fusion scheme. The decision fusion employed a global and local confidence measurement to formulate the reliability of each classifier based on both the global and local context when making overall decisions. Experiments were conducted based on clean and noisy phonetic samples. The proposed method outperformed conventional Mel frequency cepstral coefficients under both conditions in terms of overall phoneme classification accuracy. The fuzzy fusion scheme was shown to be capable of the aggregation of complementary information provided by each filterbank. 相似文献
9.
In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments. 相似文献
10.
Pritpal Singh Bhogeswar Borah 《Engineering Applications of Artificial Intelligence》2013,26(10):2443-2457
In this paper, we present a new model to handle four major issues of fuzzy time series forecasting, viz., determination of effective length of intervals, handling of fuzzy logical relationships (FLRs), determination of weight for each FLR, and defuzzification of fuzzified time series values. To resolve the problem associated with the determination of length of intervals, this study suggests a new time series data discretization technique. After generating the intervals, the historical time series data set is fuzzified based on fuzzy time series theory. Each fuzzified time series values are then used to create the FLRs. Most of the existing fuzzy time series models simply ignore the repeated FLRs without any proper justification. Since FLRs represent the patterns of historical events as well as reflect the possibility of appearances of these types of patterns in the future. If we simply discard the repeated FLRs, then there may be a chance of information lost. Therefore, in this model, it is recommended to consider the repeated FLRs during forecasting. It is also suggested to assign weights on the FLRs based on their severity rather than their patterns of occurrences. For this purpose, a new technique is incorporated in the model. This technique determines the weight for each FLR based on the index of the fuzzy set associated with the current state of the FLR. To handle these weighted FLRs and to obtain the forecasted results, this study proposes a new defuzzification technique. The proposed model is verified and validated with three different time series data sets. Empirical analyses signify that the proposed model have the robustness to handle one-factor time series data set very efficiently than the conventional fuzzy time series models. Experimental results show that the proposed model also outperforms over the conventional statistical models. 相似文献
11.
Time series classification is a supervised learning problem aimed at labeling temporally structured multivariate sequences of variable length. The most common approach reduces time series classification to a static problem by suitably transforming the set of multivariate input sequences into a rectangular table composed by a fixed number of columns. Then, one of the alternative efficient methods for classification is applied for predicting the class of new temporal sequences. In this paper, we propose a new classification method, based on a temporal extension of discrete support vector machines, that benefits from the notions of warping distance and softened variable margin. Furthermore, in order to transform a temporal dataset into a rectangular shape, we also develop a new method based on fixed cardinality warping distances. Computational tests performed on both benchmark and real marketing temporal datasets indicate the effectiveness of the proposed method in comparison to other techniques. 相似文献
12.
In recent years, dynamic time warping (DTW) has begun to become the most widely used technique for comparison of time series data where extensive a priori knowledge is not available. However, it is often expected a multivariate comparison method to consider the correlation between the variables as this correlation carries the real information in many cases. Thus, principal component analysis (PCA) based similarity measures, such as PCA similarity factor (SPCA), are used in many industrial applications.In this paper, we present a novel algorithm called correlation based dynamic time warping (CBDTW) which combines DTW and PCA based similarity measures. To preserve correlation, multivariate time series are segmented and the local dissimilarity function of DTW originated from SPCA. The segments are obtained by bottom-up segmentation using special, PCA related costs. Our novel technique qualified on two databases, the database of signature verification competition 2004 and the commonly used AUSLAN dataset. We show that CBDTW outperforms the standard SPCA and the most commonly used, Euclidean distance based multivariate DTW in case of datasets with complex correlation structure. 相似文献
13.
14.
Shihn-Yuarn Chen Tzu-Ting Tseng Hao-Ren Ke Chuen-Tsai Sun 《Expert systems with applications》2011,38(10):12807-12817
Social tagging is widely practiced in the Web 2.0 era. Users can annotate useful or interesting Web resources with keywords for future reference. Social tagging also facilitates sharing of Web resources. This study reviews the chronological variation of social tagging data and tracks social trends by clustering tag time series. The data corpus in this study is collected from Hemidemi.com. A tag is represented in a time series form according to its annotating Web pages. Then time series clustering is applied to group tag time series with similar patterns and trends in the same time period. Finally, the similarities between clusters in different time periods are calculated to determine which clusters have similar themes, and the trend variation of a specific tag in different time periods is also analyzed. The evaluation shows the recommendation accuracy of the proposed approach is about 75%. Besides, the case discussion also proves the proposed approach can track the social trends. 相似文献
15.
时间序列子序列匹配作为时间序列检索、聚类、分类、异常监测等挖掘任务的基础被广泛研究。但传统的时间序列子序列匹配都是对精确相同或近似相同的模式进行匹配,为此定义了一种全新的具有相似发展趋势的序列模式——时间序列同构关系,经过数学推导给出了时间序列同构关系判定的法则,并基于此提出了同构关系时间序列片段发现的算法。该算法首先对原始时间序列进行预处理,然后分段拟合后对各时间序列分段进行同构关系判定。针对现实背景数据难以满足理论约束的问题,通过定义一个同构关系容忍度参数使实际时间序列数据的同构关系挖掘成为可能。实验结果表明,该算法能有效挖掘出满足同构关系的时间序列片段。 相似文献
16.
在模式挖掘应用于智能化方法过程中,为了提高数据变化模式的准确性和可用性,以FC闭包模型为基础,对专家界定的领域影响因子进行逻辑转化,采用距离均方差算法以时间序列为基础处理原始数据,并利用激巨判定函数摒弃无效元素,降低数据维度,完成数据准备。选定恰当可行的数学模型进行时序数据拟合,借鉴分类分析法的思想,引入CCM-ECM模型表达最终挖掘结果,完成时序下模式挖掘模型(TODM)设计,同时为该模型的置信度计算和自适应调整提出一套较为科学的计算方法,以此达到深度挖掘数据内部潜在规律,提高数据变化模式的高精细化描述程度的目的。最后结合油井施工作业过程,利用TODM模型实现了油井施工作业后模式挖掘系统的设计。 相似文献
17.
针对现有模糊时间序列预测算法无法适应预测中新关系出现的问题,提出了一种基于区间相似度的模糊时间序列预测(ISFTS)算法。首先,在模糊理论的基础上,采用基于均值的方法二次划分论域的区间,在论域区间上定义相应模糊集将历史数据模糊化;然后建立三阶模糊逻辑关系并引入逻辑关系相似度的计算公式,计算未来数据变化趋势值得到预测的模糊值;最后对预测模糊值去模糊化得到预测的确定值。由于ISFTS算法是预测数据变化趋势,克服了目前预测算法的逻辑关系的缺陷。仿真实验结果表明,与同类的预测算法相比,ISFTS算法预测误差更小,在误差相对比(MAPE)、绝对误差均值(MAE)和均方根误差(RMSE)三项指标上均优于同类的对比算法,因此ISFTS算法在时间序列预测中尤其是大数据量情况下的预测具有更强的适应性。 相似文献
18.
为克服维数灾难和过拟合等传统算法所不可规避的问题,利用支持向量机(Support Vector Machine,SVM)提出基于时序数据时间相关性的核函数修正选择方法,并以真实的二氧化硫(SO2)数据为实验数据验证该方法的有效性.实验结果表明采用时序核函数对测试数据集的拟合效果更好,并对模型泛化能力有一定的提高. 相似文献
19.
提出了一种基于DTW的符号化时间序列聚类算法,对降维后得到的不等长符号时间序列进行聚类。该算法首先对时间序列进行降维处理,提取时间序列的关键点,并对其进行符号化;其次利用DTW方法进行相似度计算;最后利用Normal矩阵和FCM方法进行聚类分析。实验结果表明,将DTW方法应用在关键点提取之后的符号化时间序列上,聚类结果的准确率有较好大提高。 相似文献
20.
Querying time series data based on similarity 总被引:3,自引:0,他引:3
We study similarity queries for time series data where similarity is defined, in a fairly general way, in terms of a distance function and a set of affine transformations on the Fourier series representation of a sequence. We identify a safe set of transformations supporting a wide variety of comparisons and show that this set is rich enough to formulate operations such as moving average and time scaling. We also show that queries expressed using safe transformations can efficiently be computed without prior knowledge of the transformations. We present a query processing algorithm that uses the underlying multidimensional index built over the data set to efficiently answer similarity queries. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We propose a generalization of this algorithm for simultaneously handling multiple transformations at a time, and give experimental results on the performance of the generalized algorithm 相似文献