首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known clustering methods like k-means and hierarchical clustering may not produce satisfactory results.In this article, we develop a statistical model and algorithms for clustering data in the presence of errors. We assume that the errors associated with data follow a multivariate Gaussian distribution and are independent between data points. The model uses the maximum likelihood principle and provides us with a new metric for clustering. This metric is used to develop two algorithms for error-based clustering, hError and kError, that are generalizations of Ward's hierarchical and k-means clustering algorithms, respectively.We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on clustering derived data (typically parameter estimates) obtained by fitting statistical models to the observed data. We show that, for Gaussian distributed observed data, the optimal error-based clusters of derived data are the same as the maximum likelihood clusters of the observed data. We also report briefly on two applications with real-world data and a series of simulation studies using four statistical models: (1) sample averaging, (2) multiple linear regression, (3) ARIMA models for time-series, and (4) Markov chains, where error-based clustering performed significantly better than traditional clustering methods.  相似文献   

2.
We investigate the potential of the analysis of noisy non-stationary time series by quantising it into streams of discrete symbols and applying finite-memory symbolic predictors. Careful quantisation can reduce the noise in the time series to make model estimation more amenable. We apply the quantisation strategy in a realistic setting involving financial forecasting and trading. In particular, using historical data, we simulate the trading of straddles on the financial indexes DAX and FTSE 100 on a daily basis, based on predictions of the daily volatility differences in the underlying indexes. We propose a parametric, data-driven quantisation scheme which transforms temporal patterns in the series of daily volatility changes into grammatical and statistical patterns in the corresponding symbolic streams. As symbolic predictors operating on the quantised streams, we use the classical fixed-order Markov models, variable memory length Markov models and a novel variation of fractal-based predictors, introduced in its original form in Tin_ o and Dorffner [1]. The fractal-based predictors are designed to efficiently use deep memory. We compare the symbolic models with continuous techniques such as time-delay neural networks with continuous and categorical outputs, and GARCH models. Our experiments strongly suggest that the robust information reduction achieved by quantising the real-valued time series is highly beneficial. To deal with non-stationarity in financial daily time series, we propose two techniques that combine ‘sophisticated’ models fitted on the training data with a fixed set of simple-minded symbolic predictors not using older (and potentially misleading) data in the training set. Experimental results show that by quantising the volatility differences and then using symbolic predictive models, market makers can sometimes generate a statistically significant excess profit. We also mention some interesting observations regarding the memory structure in the series of daily volatility differences studied.  相似文献   

3.
基于LSSVM的混沌时间序列的多步预测   总被引:18,自引:1,他引:17  
江田汉  束炯 《控制与决策》2006,21(1):77-0080
结合相空间重构理论和统计学习理论,实现混沌时间序列的多步预测.采用擞熵率法求得最优嵌入维数和时延参数,重构系统相空间,用最小二乘支持向量机建立渑沌时间序列的多步预测模型,并与径向基函数网络预测模型比较.结果表明,所建立的模型能够捕捉到原混沌系统的动力学特征.前者的归一化均方根预测误差远小于径向基函数网络预测模型的预测误差,泛化能力较强.其预测效果较好.  相似文献   

4.
One of the main problems in modelling multivariate conditional covariance time series is the parameterization of the correlation structure. If no constraints are imposed, it implies a large number of unknown coefficients. The most popular models propose parsimonious representations, imposing similar correlation structures to all the series or to groups of time series, but the choice of these groups is quite subjective. A statistical approach is proposed to detect groups of homogeneous time series in terms of correlation dynamics for one of the widely used models: the Dynamic Conditional Correlation model. The approach is based on a clustering algorithm, which uses the idea of distance between dynamic conditional correlations, and the classical Wald test, to compare the coefficients of two groups of dynamic conditional correlations. The proposed approach is evaluated in terms of simulation experiments and applied to a set of financial time series.  相似文献   

5.
The availability of influent wastewater time series is crucial when using models to assess the performance of a wastewater treatment plant (WWTP) under dynamic flow and loading conditions. Given the difficulty of collecting sufficient data, synthetic generation could be the only option. In this paper a hybrid of statistical (a Markov chain-gamma model for stochastic generation of rainfall and two different multivariate autoregressive models for stochastic generation of air temperature and influent time series in dry conditions) and conceptual modeling techniques is proposed for synthetic generation of influent time series. The time series of rainfall and influent in dry weather conditions are generated using two types of statistical models. These two time series serve as inputs to a conceptual sewer model for generation of influent time series. The application of the proposed influent generator to the Eindhoven WWTP shows that it is a powerful tool for realistic generation of influent time series and is well-suited for probabilistic design of WWTPs as it considers both the effect of input variability and total model uncertainty.  相似文献   

6.
To improve the prediction accuracy of complex multivariate chaotic time series, a novel scheme formed on the basis of multivariate local polynomial fitting with the optimal kernel function is proposed. According to Takens Theorem, a chaotic time series is reconstructed into vector data, multivariate local polynomial regression is used to fit the predicted complex chaotic system, then the regression model parameters with the least squares method based on embedding dimensions are estimated,and the prediction value is calculated. To evaluate the results, the proposed multivariate chaotic time series predictor based on multivariate local polynomial model is compared with a univariate predictor with the same numerical data. The simulation results obtained by the Lorenz system show that the prediction mean squares error of the multivariate predictor is much smaller than the univariate one, and is much better than the existing three methods. Even if the last half of the training data are used in the multivariate predictor, the prediction mean squares error is smaller than that of the univariate predictor.  相似文献   

7.
We evaluate two approaches for time series classification based on reservoir computing. In the first, classical approach, time series are represented by reservoir activations. In the second approach, on top of the reservoir activations, a predictive model in the form of a readout for one-step-ahead-prediction is trained for each time series. This learning step lifts the reservoir features to a more sophisticated model space. Classification is then based on the predictive model parameters describing each time series. We provide an in-depth analysis on time series classification in reservoir- and model-space. The approaches are evaluated on 43 univariate and 18 multivariate time series. The results show that representing multivariate time series in the model space leads to lower classification errors compared to using the reservoir activations directly as features. The classification accuracy on the univariate datasets can be improved by combining reservoir- and model-space.  相似文献   

8.
We extend the full-factor multivariate GARCH model of Vrontos et al. (Econom J 6:312–334, 2003a) to account for fat tails in the conditional distribution of financial returns, using a multivariate Student-t error distribution. For the new class of Student-t full factor multivariate GARCH models, we derive analytical expressions for the score, the Hessian matrix and the Information matrix. These expressions can be used within classical inferential procedures in order to obtain maximum likelihood estimates for the model parameters. This fact, combined with the parsimonious parameterization of the covariance matrix under the full factor multivariate GARCH models, enables us to apply the models in high dimensional problems. We provide implementation details and illustrations using financial time series on eight stocks of the US market.  相似文献   

9.
Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known clustering methods like k-means and hierarchical clustering may not produce satisfactory results.In this article, we develop a statistical model and algorithms for clustering data in the presence of errors. We assume that the errors associated with data follow a multivariate Gaussian distribution and are independent between data points. The model uses the maximum likelihood principle and provides us with a new metric for clustering. This metric is used to develop two algorithms for error-based clustering, hError and kError, that are generalizations of Ward's hierarchical and k-means clustering algorithms, respectively.We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on clustering derived data (typically parameter estimates) obtained by fitting statistical models to the observed data. We show that, for Gaussian distributed observed data, the optimal error-based clusters of derived data are the same as the maximum likelihood clusters of the observed data. We also report briefly on two applications with real-world data and a series of simulation studies using four statistical models: (1) sample averaging, (2) multiple linear regression, (3) ARIMA models for time-series, and (4) Markov chains, where error-based clustering performed significantly better than traditional clustering methods.  相似文献   

10.
传统预测模型在处理多元时间序列时, 常常难以捕捉其非线性动力系统的复杂变化规律导致预测精度较低. 针对此问题, 本文将PCC-BiLSTM-GRU-Attention组合模型的预测方法进行了探讨和验证. 该方法首先使用Pearson相关系数(PCC)进行相关性检验并删除无关特征, 实现了对多元数据的降维选优. 其次使用双向长短期记忆神经网络(BiLSTM)双向提取时序特征. 最后使用GRU神经网络融合注意力机制(Attention), 进一步学习双向时序特征的变化规律, 精准捕捉关键时刻的信息. 为了验证该方法在多元时间序列中的可行性, 本文以股票价格预测作为实验场景, 并与BP模型、LSTM模型、GRU模型、BiLSTM-GRU模型、BiLSTM-GRU-Attention模型进行对比. 验证结果表明: 本文探讨的PCC-BiLSTM-GRU-Attention组合模型的预测方法相比其他模型具有较高的预测精度, 其平均绝对百分比误差(MAPE)达到了2.484%, 决定系数达到了0.966.  相似文献   

11.
Time series analysis is a common tool in environmental and ecological studies to construct models to explain and forecast serially correlated data. There are several statistical techniques that are used to deal with univariate and multivariate (more than one series) chronological patterns of fisheries data. In this paper, an additive stochastic model is proposed with explicative and predictive features to capture the main seasonal patterns and trends of a fisheries system in the Amazon. The model is constructed on the assumption that the multivariate response variable – vector containing fishery yield of eight periodic species and the total fishery yield – can be decomposed into three terms: an autoregression of the response vector, an exogenous environmental variable (river level), and a seasonal component (significant frequencies obtained by using spectral analysis and the periodogram indicating the regularity of periodic cycles in the natural and fisheries system). The estimation procedure is carried out via maximum likelihood estimation. The model explained, on average, 78% of the variability in yield of the study species. The model represents the optimal solution (minimum mean square mean error) among the class of all multivariate autoregressive processes with exogenous and seasonal variables. Predictions for one period ahead are provided to illustrate how the model works in practice.  相似文献   

12.
Classification models for multivariate time series have drawn the interest of many researchers to the field with the objective of developing accurate and efficient models. However, limited research has been conducted on generating adversarial samples for multivariate time series classification models. Adversarial samples could become a security concern in systems with complex sets of sensors. This study proposes extending the existing gradient adversarial transformation network (GATN) in combination with adversarial autoencoders to attack multivariate time series classification models. The proposed model attacks classification models by utilizing a distilled model to imitate the output of the multivariate time series classification model. In addition, the adversarial generator function is replaced with a variational autoencoder to enhance the adversarial samples. The developed methodology is tested on two multivariate time series classification models: 1-nearest neighbor dynamic time warping (1-NN DTW) and a fully convolutional network (FCN). This study utilizes 30 multivariate time series benchmarks provided by the University of East Anglia (UEA) and University of California Riverside (UCR). The use of adversarial autoencoders shows an increase in the fraction of successful adversaries generated on multivariate time series. To the best of our knowledge, this is the first study to explore adversarial attacks on multivariate time series. Additionally, we recommend future research utilizing the generated latent space from the variational autoencoders.   相似文献   

13.
This paper considers a statistical calibration model with errors in both standard and nonstandard measurements. Under the assumption that replicated observations are available, two estimation techniques (i.e. ordinary and grouping least squares estimation) and two prediction methods (i.e. classical and inverse prediction) are compared in terms of the average squared error of prediction by Monte Carlo simulation. For the range of parameter values considered, it is found that the ordinary least squares with inverse prediction gives the smallest average squared error of prediction in interpolation. For extrapolation, the grouping least squares with inverse prediction is generally preferred. As for the design effect, it is found that the optimal design for the classical model is not necessarily optimal when the standard measurement is also subject to error.  相似文献   

14.
由于现实中的时间序列通常同时具有线性和非线性特征,传统ARIMA模型在时间序列建模中常表现出一定局限性.对此,提出基于ARIMA和LSTM混合模型进行时间序列预测.应用线性ARIMA模型进行时间序列预测,用支持向量回归(SVR)模型对误差序列进行预测,采用深度LSTM模型对ARIMA模型和SVR模型的预测结果组合,并将...  相似文献   

15.
This research proposes the three schemes of estimating and adding mid-terms to multivariate time series. In this research, the back propagation is adopted as the approach to multivariate time series prediction. It is traditionally designed for the task with the two models: separated model and combined model. In the proposed version of time series prediction systems, the mid-term estimator is added as the additional module to the traditional version. It is validated empirically that the three VTG (Virtual Term Generation) schemes are effective on using the back propagation for multivariate time series prediction on the four test data sets: three artificial one and a real test one.  相似文献   

16.
In this study, an artificial neural network (ANN) structure is proposed for seasonal time series forecasting. The proposed structure considers the seasonal period in time series in order to determine the number of input and output neurons. The model was tested for four real-world time series. The results found by the proposed ANN were compared with the results of traditional statistical models and other ANN architectures. This comparison shows that the proposed model comes with lower prediction error than other methods. It is shown that the proposed model is especially convenient when the seasonality in time series is strong; however, if the seasonality is weak, different network structures may be more suitable.  相似文献   

17.

This study compares time series and machine learning models for inflation forecasting. Empirical evidence from the USA between 1984 and 2014 suggests that out of sixteen conditions (four different inflation indicators and four different horizons), machine learning models provide more accurate forecasting results in seven conditions and the time series models are better in nine conditions. Moreover, multivariate models give better results in fourteen conditions, and univariate models are better only in two conditions. This study shows that machine learning model prevails against time series models for the core personal consumption expenditure (core-PCE) inflation forecasting, and the time series model (ARDL) is better for the core consumer price (core-CPI) index inflation forecasting in all horizons.

  相似文献   

18.
为了提高网络流量预测准确性,结合网络流量的变化特点,针对当前网络流量预测模型存在的局限性,设计了基于小波变换和极限学习机的网络流量预测模型。首先分析了当前国内外网络流量预测研究现状,找到引起网络流量预测准确性差的原因;然后采用小波变换对原始网络流量时间序列进行去噪,得到无噪声的网络流量时间序列;最后采用极限学习机对网络流量时间序列进行建模,得到相应的预测结果。与当前经典的网络流量预测模型在相同环境下进行对照测试,测试结果分析表明,小波变换和极限学习机的网络流量预测精度达到了95%以上,网络流量预测误差得到了有效的控制,而且提升了网络流量预测效率,预测结果要远优于当前经典的网络流量预测模型。  相似文献   

19.
Many current technological challenges require the capacity of forecasting future measurements of a phenomenon. This, in most cases, leads directly to solve a time series prediction problem. Statistical models are the classical approaches for tackling this problem. More recently, neural approaches such as Backpropagation, Radial Basis Functions and recurrent networks have been proposed as an alternative. Most neural-based predictors have chosen a global modelling approach, which tries to approximate a goal function adjusting a unique model. This philosophy of design could present problems when data is extracted from a phenomenon that continuously changes its operational regime or represents distinct operational regimes in a unbalanced manner. In this paper, two alternative neural-based local modelling approaches are proposed. Both follow the divide and conquer principle, splitting the original prediction problem into several subproblems, adjusting a local model for each one. In order to check their adequacy, these methods are compared with other global and local modelling classical approaches using three benchmark time series and different sizes (medium and high) of training data sets. As it is shown, both models demonstrate to be useful pragmatic paradigms to improve forecasting accuracy, with the advantages of a relatively low computational time and scalability to data set size.  相似文献   

20.
Wu  Xinfang  Xiang  Yong  Mao  Gang  Du  Mingqian  Yang  Xiuqing  Zhou  Xinzhi 《The Journal of supercomputing》2021,77(5):4221-4243

The future airports will head toward a highly intelligent direction, like the unmanned check-in services, while the scale and resources allocation of the ground service are tightly related to the air passenger flow. Therefore, forecasting passenger flow accurately will affect the development of future airports and the optimization of service of civil airlines significantly. As a kind of time series, air passenger flow is influenced by multiple factors, particularly, the stochastic part of seasonality, trend and volatility. These will ultimately affect the accuracy of the prediction. Therefore, this paper introduces a prediction model based on a two-phase learning framework. In phase one, various predictors cope with different features of time series in parallel and the prediction results are integrated in phase two. Furthermore, this paper has compared principal error indicators with actual data and results show that the two-phase learning model performs better than current fusion models and owns stable performance.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号