首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation.  相似文献   

2.
The recent trends in collecting huge and diverse datasets have created a great challenge in data analysis. One of the characteristics of these gigantic datasets is that they often have significant amounts of redundancies. The use of very large multi-dimensional data will result in more noise, redundant data, and the possibility of unconnected data entities. To efficiently manipulate data represented in a high-dimensional space and to address the impact of redundant dimensions on the final results, we propose a new technique for the dimensionality reduction using Copulas and the LU-decomposition (Forward Substitution) method. The proposed method is compared favorably with existing approaches on real-world datasets: Diabetes, Waveform, two versions of Human Activity Recognition based on Smartphone, and Thyroid Datasets taken from machine learning repository in terms of dimensionality reduction and efficiency of the method, which are performed on statistical and classification measures.  相似文献   

3.
This paper explores dimensionality reduction (DR) approaches for visualizing high dimensional data in chemical processes. Visualization provides powerful insight and process understanding in the industrial context, and accelerates process troubleshooting. A diverse array of existing, easy-to-use DR methods are evaluated in three case studies on large-scale industrial manufacturing plants. Supervised and unsupervised cases are presented with the objective of solving typical industrial problems related to unplanned events, plant performance improvement, and quality underperformance troubleshooting. For the unsupervised case, the evaluation aims to identify approaches that provide insight beyond those of PCA (Principal Component Analysis), and also examines quality metrics of the reduced (latent) space which characterize the degree of trust in the DR. UMAP (Uniform Manifold Approximation and Projection) outperforms other techniques, bringing new insights when comparing with other methods. For the supervised case, UMAP is combined with traditional variable selection methods, such as VIP (Variable Influence on Projection) weights from PLS-DA (Partial Least Squares Discriminant Analysis), in order to improve latent space visualization by increasing separation between classes.  相似文献   

4.
Dimensionality reduction of clustered data sets   总被引:1,自引:0,他引:1  
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets.  相似文献   

5.
Binary encoding is an approach that aims at summarizing the information contained in various spectral bands into a single image that stores the meaningful information of the bands. In this paper, it is introduced a feature extraction approach to reduce the dimensionality of hyperspectral data with binary encoding for classification purposes. Different options to reduce the radiometric information of the pixels are introduced, such as using a single threshold or multiple thresholds. After the dimensionality reduction, the separation of the spectral classes was analysed and the thematic classification of the reduced data was performed. In order to evaluate the performance of the proposed approach, experiments on AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) image, ROSIS (Reflection Optics System Imaging Spectrometer) hyperspectral image and HYDICE (Hyperspectral Digital Imagery Collection Experiment) hyperspectral image are presented. In the experiments, neighbouring spectral bands are grouped and coded and the results of the classification are compared. The results show that the use of binary encoding based on three thresholds by spectral region is more efficient than with the use of one threshold. The thematic mapping of the hyperspectral data with reduced dimension confirms the competitiveness of the binary encoding method compared with other dimension reduction methods, such as the Principal Component Analysis (PCA), the Principal Component Analysis – Fisher’s Linear Discriminant Analysis (PCA-LDA), the Discriminant Analysis Feature Extraction (DAFE) and the Non-parametric Weighted Feature Extraction (NWFE). In this context, the present methodology shows to be promising, because it reduces the computational complexity and improves performance.  相似文献   

6.
Functional Data Analysis deals with samples where a whole function is observed for each individual. A relevant case of FDA is when the observed functions are density functions. Among the particular characteristics of density functions, the most of the fact that they are an example of infinite dimensional compositional data (parts of some whole which only carry relative information) is made. Several dimensionality reduction methods for this particular type of data are compared: functional principal components analysis with or without a previous data transformation, and multidimensional scaling for different inter-density distances, one of them taking into account the compositional nature of density functions. The emphasis is on the steps previous and posterior to the application of a particular dimensionality reduction method: care must be taken in choosing the right density function transformation and/or the appropriate distance between densities before performing dimensionality reduction; subsequently the graphical representation of dimensionality reduction results must take into account that the observed objects are density functions. The different methods are applied1 to artificial and real data (population pyramids for 223 countries in year 2000). As a global conclusion, the use of multidimensional scaling based on compositional distance is recommended.  相似文献   

7.
针对基于随机响应的隐私保护分类挖掘算法仅适用于原始数据属性值是二元的问题,设计了一种适用于多属性值原始数据的隐私保护分类挖掘算法。算法分为两个部分:a)通过比较参数设定值和随机产生数之间的大小,决定是否改变原始数据的顺序,以实现对原始数据进行变换,从而起到保护数据隐私性的目的;b)通过求解信息增益比例的概率估计值,在伪装后的数据上构造决策树。  相似文献   

8.
针对基于随机响应的隐私保护分类挖掘算法仅适用于原始数据属性值是二元的问题,设计了一种适用于多属性值原始数据的隐私保护分类挖掘算法。算法分为两个部分:a)通过比较参数设定值和随机产生数之间的大小,决定是否改变原始数据的顺序,以实现对原始数据进行变换,从而起到保护数据隐私性的目的;b)通过求解信息增益比例的概率估计值,在伪装后的数据上构造决策树。  相似文献   

9.
ABSTRACT

Cotton is the most important fibre culture in the world. In Brazil, cotton cultivation is concentrated in the Cerrado biome, the Brazilian savanna, and is one of the most important commodities in the country. As an annual crop, the updating frequency of the spatial distribution data of cotton fields is extremely important for crop monitoring systems. In order to provide fast and accurate information for crop monitoring, time series of remote- sensing data has been used in the development of several applications in agriculture, since the high temporal resolution of some orbital sensor allows monitoring targets with high spectral-temporal variations in the land surface. However, there are still some challenges to systematize the processing of such a large amount of data available by long time series of remote-sensing imagery. Thus, this study contributes to the construction of models to identify and separate specific crop types with similar spectral behaviour to other crops practised in the same period. The objective of this study was to develop a systematic methodology based on data mining of time series of vegetation indices (VI) to map cotton fields at the regional scale. Field reference data and time series of NDVI and EVI images, obtained from MODIS sensor products during four cropping seasons (from 2012–2013 to 2015–2016), were used to construct mapping models based on decision tree algorithms. Phenological metrics were calculated from the VI time series and used to build classification rules for mapping cotton fields. Our results demonstrate that the proposed method to map cotton fields achieve high accuracy when field data and visual interpretation of NDVI temporal profiles were used for validation (accuracy higher than 95% and 93%, respectively). Comparisons with the official statistics indicated an optimal fit, with linear correlation (r) and coefficient of determination (R2) above 0.93. Therefore, the proposed method was efficient to distinguish cotton fields from other crop types with similar spectral behaviour. In addition, this method can also be applied to other cotton-producing regions and other production seasons, by reusing the models generated through machine learning approaches.  相似文献   

10.
为了有效地约简稀疏数据的维度,提出一种基于切空间判别的稀疏数据局部降维方法,其思想是扩展局部邻域,增大样本点间的重叠信息,使之在稀疏降维过程中通过充分的信息达到精确的低维嵌入;利用切空间判别的方法对扩展后局部区域的样本点进行选择保留,弃除切方向变化较大的点,使之实现更好的降维效果。实验结果表明,在人工生成的数据集上,新方法获得了较好的嵌入结果;并且在人脸识别与图像检索中得到了期望的可视化分类结果。  相似文献   

11.
模糊时间序列挖掘在复杂系统模糊建模中的应用   总被引:5,自引:0,他引:5  
针对于复杂工业过程领域模糊建模问题, 提出了一种基于时间序列的模糊定量数据挖掘方法, 并讨论了其在复杂系统模糊逻辑推理模型结构辨识中的应用. 该方法建立在系统历史采集数据库基础之上, 较好的解决了多入多出 (MIMO)非线性复杂工业过程模糊建模时初始模型的建立问题. 文章最后讨论了该方法在氧化铝熟料烧结回转窑建模中的应用, 取得了良好的现场运行效果.  相似文献   

12.
13.
在供水管网中部署传感器网络实时获取多个水质参数时间序列数据,当供水管网发生污染时,高效准确地检测水质异常是一个重要问题。提出多变量水质参数时间异常事件检测算法(M-TAEDA),利用BP模型分析多变量水质参数的时序数据,确定可能离群点;结合贝叶斯序贯分析独立更新每个参数的事件概率,预测单个传感器节点检测的异常概率;将单变量的事件概率融合为统一多变量事件概率,融合判断异常事件。实验结果表明:BP模型模拟多变量水质参数进行预测可以达到90%精确度;与单变量参数时间异常事件检测算法(S-TAEDA)相比,M-TAEDA可以提高异常检出率约40%,降低误报率约45%。  相似文献   

14.
This paper describes the usage of dimensionality reduction techniques for computer facial animation. Techniques such as Principal Components Analysis (PCA), Expectation-Maximization (EM) algorithm for PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE) are compared for the purpose of facial animation of different emotions. The experimental results on our facial animation data demonstrate the usefulness of dimensionality reduction techniques for both space and time reduction. In particular, the EMPCA algorithm performed especially well in our dataset, with negligible error of only 1-2%.  相似文献   

15.
多变量时间序列模式挖掘的研究   总被引:4,自引:0,他引:4  
张军  吴绍春  王炜 《计算机工程与设计》2006,27(18):3364-3366,3384
多变量时间序列数据集合在许多领域中存在,由于其观测变量之间的相互关联性,往往需要进行综合分析.使用基于时间序列相似性的多变量时间序列模式挖掘方法,从历史数据中寻找出相似的多变量时间序列.将多变量的数据集分段平均为连续矩阵,并采用基于主成分分析和奇异值分解的方法来对矩阵进行相似性比较,最后通过相邻片断的合并以组成更高层次的时序片断,以提高模式的匹配的范围.并在地震前兆数据进行了实现.  相似文献   

16.
李冬睿  许统德 《计算机应用》2012,32(8):2253-2257
针对现有基于流形学习的降维方法对局部邻域大小选择的敏感性,且降至低维后的数据不具有很好的可分性,提出一种自适应邻域选择的数据可分性降维方法。该方法通过估计数据的本征维度和局部切方向来自适应地选择每一样本点的邻域大小;同时,使用映射数据时的聚类信息来汇聚相似的样本点,保证降维后的数据具有良好的可分性,使之实现更好的降维效果。实验结果表明,在人工生成的数据集上,新方法获得了较好的嵌入结果;并且在人脸的可视化分类和图像检索中得到了期望的结果。  相似文献   

17.
Fuzzy time-series models have been widely applied due to their ability to handle nonlinear data directly and because no rigid assumptions for the data are needed. In addition, many such models have been shown to provide better forecasting results than their conventional counterparts. However, since most of these models require complicated matrix computations, this paper proposes the adoption of a multivariate heuristic function that can be integrated with univariate fuzzy time-series models into multivariate models. Such a multivariate heuristic function can easily be extended and integrated with various univariate models. Furthermore, the integrated model can handle multiple variables to improve forecasting results and, at the same time, avoid complicated computations due to the inclusion of multiple variables.  相似文献   

18.
Automatic acoustic-based vehicle detection is a common task in security and surveillance systems. Usually, a recording device is placed in a designated area and a hardware/software system processes the sounds that are intercepted by this recording device to identify vehicles only as they pass by. An algorithm, which is suitable for online automatic detection of vehicles, which is based on their online acoustic recordings, is proposed. The scheme uses dimensionality reduction methodologies such as random projections instead of using traditional signal processing methods to extract features. It uncovers characteristic features of the recorded sounds without any assumptions about the structure of the signal. The set of features is classified by the application of PCA. The microphone is opened all the time and the algorithm filtered out many background noises such as wind, steps, speech, airplanes, etc. The introduced algorithm is generic and can be applied to various signal types for solving different detection and classification problems.  相似文献   

19.
We define the problem of bounded similarity querying in time-series databases, which generalizes earlier notions of similarity querying. Given a (sub)sequence S, a query sequence Q, lower and upper bounds on shifting and scaling parameters, and a tolerance , S is considered boundedly similar to Q if S can be shifted and scaled within the specified bounds to produce a modified sequence S′ whose distance from Q is within . We use similarity transformation to formalize the notion of bounded similarity. We then describe a framework that supports the resulting set of queries; it is based on a fingerprint method that normalizes the data and saves the normalization parameters. For off-line data, we provide an indexing method with a single index structure and search technique for handling all the special cases of bounded similarity querying. Experimental investigations find the performance of our method to be competitive with earlier, less general approaches.  相似文献   

20.
高阳  王雪松  程玉虎  汪婵 《控制与决策》2013,28(8):1219-1225
为了在充分利用高光谱信息的同时减少因数据冗余带来的分类精度降低,提出一种块非负稀疏重构嵌入降维算法。首先,将传统超完备字典转化成超完备块字典;然后,通过计算每个超完备块字典对应样本的最小重构误差,得到块非负稀疏重构权重矩阵;最后,在低维嵌入时,通过同时最小化局部和最大化非局部高光谱数据的非负稀疏信息,得到全局最优的低维子空间高光谱数据。通过3组高光谱数据的实验结果验证了所提出方法的可行性和有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号