首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Exploring process data with the use of robust outlier detection algorithms   总被引:3,自引:0,他引:3  
To implement on-line process monitoring techniques such as principal component analysis (PCA) or partial least squares (PLS), it is necessary to extract data associated with the normal operating conditions from the plant historical database for calibrating the models. One way to do this is to use robust outlier detection algorithms such as resampling by half-means (RHM), smallest half volume (SHV), or ellipsoidal multivariate trimming (MVT) in the off-line model building phase. While RHM and SHV are conceptually clear and statistically sound, the computational requirements are heavy. Closest distance to center (CDC) is proposed in this paper as an alternative for outlier detection. The use of Mahalanobis distance in the initial step of MVT for detecting outliers is known to be ineffective. To improve MVT, CDC is incorporated with MVT. The performance was evaluated relative to the goal of finding the best half of a data set. Data sets were derived from the Tennessee Eastman process (TEP) simulator. Comparable results were obtained for RHM, SHV, and CDC. Better performance was obtained when CDC is incorporated with MVT, compared to using CDC and MVT alone. All robust outlier detection algorithms outperformed the standard PCA algorithm. The effect of auto scaling, robust scaling and a new scaling approach called modified scaling were investigated. With the presence of multiple outliers, auto scaling was found to degrade the performance of all the robust techniques. Reasonable results were obtained with the use of robust scaling and modified scaling.  相似文献   

2.
Exploratory factor analysis is commonly used in IS research to detect multivariate data structures. Frequently, the method is blindly applied without checking if the data fulfill the requirements of the method. We investigated the influence of sample size, data transformation, factor extraction method, rotation, and number of factors on the outcome. We compared classical exploratory factor analysis with a robust counterpart which is less influenced by data outliers and data heterogeneities. Our analyses revealed that robust exploratory factor analysis is more stable than the classical method.  相似文献   

3.
In the context of randomization tests, this paper discusses the roles of exploratory data analysis (EDA) and confirmatory data analysis (CDA) in geoscience research. It shows: (1) how the classical methods of statistical inference can be used in EDA with nonrandom samples of data, and (2) how much of the knowledge in the geosciences is derived from EDA. The paper gives a FORTRAN IV computer program, CLASSTEST, that performs a randomization test for a multivariate analysis of variance (MANOVA) design. CLASSTEST will be useful in goescience research apart from its use in illustrating EDA and CDA.  相似文献   

4.
A new definition of depth for functional observations is introduced based on the notion of “half-region” determined by a curve. The half-region depth provides a simple and natural criterion to measure the centrality of a function within a sample of curves. It has computational advantages relative to other concepts of depth previously proposed in the literature which makes it applicable to the analysis of high-dimensional data. Based on this depth a sample of curves can be ordered from the center-outward and order statistics can be defined. The properties of the half-region depth, such as consistency and uniform convergence, are established. A simulation study shows the robustness of this new definition of depth when the curves are contaminated. Finally, real data examples are analyzed.  相似文献   

5.
Two exploratory data analysis techniques the comap and the quad plot are shown to have both strengths and shortcomings when analysing spatial multivariate datasets. A hybrid of these two techniques is proposed: the quad map which is shown to overcome the outlined shortcomings when applied to a dataset containing weather information for disaggregate incidents of urban fires. Common to the quad plot, the quad map uses Polya models in order to articulate the underlying assumptions behind histograms. The Polya model formalises the situation in which past fire incident counts are computed and displayed in (multidimensional) histograms as appropriate assessments of conditional probability providing valuable diagnostics such as posterior variance i.e. sensitivity to new information. Finally we discuss how new technology in particular Online Analytics Processing (OLAP) and Geographical Information Systems (GISs) offer potential in automating exploratory spatial data analyses techniques, such as the quad map.  相似文献   

6.
Box Car过程数据压缩算法在现场总线控制系统中得到广泛采用。其压缩效果受记录限和压缩区间的影响。本文基于对典型仿真数据的大量计算,分析了Box Car过程数据压缩算法记录限和压缩区间对趋势平稳的过程数据的压缩比、计算时间和压缩系数的影响。本文还分析了过程数据趋势特征和波动特性对Box Car算法压缩比和逼近系数的影响。本文的计算结果对于在实际应用中根据过程数据不同的趋势和噪声特征调整Box Car压缩算法参数以获得理想的压缩效果具有指导意义。  相似文献   

7.
The problems arising when there are outliers in a data set that follow a circular distribution are considered. A robust estimation of the unknown parameters is obtained using the methods of weighted likelihood and minimum disparity, each of which is defined for a general parametric family of circular data. The class of power divergence and the related residual adjustment function is investigated in order to improve the performance of the two methods which are studied for the Von Mises (circular normal) and the Wrapped Normal distributions. The techniques are illustrated via two examples based on a real data set and a Monte Carlo study, which also enables the discussion of various computational aspects.  相似文献   

8.
The main aim of data analysis in biochemical metrology is the extraction of relevant information from biochemical data measurements. A system of extended exploratory data analysis (EDA) based on the concept of graphical tools for sample data summarization and exploration is proposed and the original EDA algorithm in S-Plus is available on the Internet at http://www.trilobyte.cz/EDA. To check basic assumptions about biochemical and medical data is to examine the independence of sample elements, sample normality and homogeneity. The exact assessment of the mean-value and the variance of steroid levels in controls is necessary for the correct assessment of the samples from patients. Data examination procedures are illustrated by a determination of the mean-value of 17-hydroxypregnenolone in the umbilical blood of newborns. For an asymmetric, strongly skewed sample distribution corrupted with outliers the best estimate of location seems to be the median. The Box–Cox transformation improves a sample symmetry. The proposed procedure gives reliable estimates of a mean-value for an asymmetric distribution of 17-hydroxypregnenolone when the arithmetic mean can not be used.  相似文献   

9.
A FORTRAN program is presented which generates a statistical model of broadscale spatially coherent data, and from that model identifies and removes outlying data values. The algorithm also interpolates missing data values by making use of this model, as well as the assumption of broadscale coherence. Examples of the application of this technique to geomagnetic data are presented. A significant improvement in the statistical efficiency and consistency of subsequent estimators is seen to obtain from preprocessing data with this method.  相似文献   

10.
Traditional hypothesis-driven research domains such as molecular biology are undergoing paradigm shift in becoming progressively data-driven, enabling rapid acquisition of new knowledge. The purpose of this article is to promote an analogous development in business research. Specifically, we focus on network analysis: given the key constructs in a business research domain, we introduce a data-driven protocol applicable to business survey data to (a) discover the web of influence directionalities among the key constructs and therein identify the critical constructs, and to (b) determine the relative contributions of the constructs in predicting the levels of the critical constructs. In (a), we build a directed connectivity graph by (i) using a state of the art statistical technique to perform variable selection, (ii) integrating the variable selection results to form the directed connectivity graph, and (iii) employing graph-theoretical concepts and a graph clustering technique to interpret the resulting network topology in a multi-resolution manner. In (b), based on the directed connectivity graph, multiple linear regression is performed to quantify relations between the critical and other constructs. As a case study, the protocol is applied to analyze opinion leading and seeking behaviors in online market communications environments. The directed connectivity relations revealed provide new ways of visualizing the web of influence directionalities among the constructs of interest, suggest new research directions to pursue, and aid decision making in marketing management. The proposed method provides a data-driven alternative to traditional confirmatory methods in analyzing relations among given constructs. Its flexibility enables the business researcher to broaden the scope of research he/she can fruitfully engage in.  相似文献   

11.
Crisp input and output data are fundamentally indispensable in traditional data envelopment analysis (DEA). However, the input and output data in real-world problems are often imprecise or ambiguous. Some researchers have proposed interval DEA (IDEA) and fuzzy DEA (FDEA) to deal with imprecise and ambiguous data in DEA. Nevertheless, many real-life problems use linguistic data that cannot be used as interval data and a large number of input variables in fuzzy logic could result in a significant number of rules that are needed to specify a dynamic model. In this paper, we propose an adaptation of the standard DEA under conditions of uncertainty. The proposed approach is based on a robust optimization model in which the input and output parameters are constrained to be within an uncertainty set with additional constraints based on the worst case solution with respect to the uncertainty set. Our robust DEA (RDEA) model seeks to maximize efficiency (similar to standard DEA) but under the assumption of a worst case efficiency defied by the uncertainty set and it’s supporting constraint. A Monte-Carlo simulation is used to compute the conformity of the rankings in the RDEA model. The contribution of this paper is fourfold: (1) we consider ambiguous, uncertain and imprecise input and output data in DEA; (2) we address the gap in the imprecise DEA literature for problems not suitable or difficult to model with interval or fuzzy representations; (3) we propose a robust optimization model in which the input and output parameters are constrained to be within an uncertainty set with additional constraints based on the worst case solution with respect to the uncertainty set; and (4) we use Monte-Carlo simulation to specify a range of Gamma in which the rankings of the DMUs occur with high probability.  相似文献   

12.
The current computational power and some recently developed algorithms allow a new automatic spectral analysis method for randomly missing data. Accurate spectra and autocorrelation functions are computed from the estimated parameters of time series models, without user interaction. If only a few data are missing, the accuracy is almost the same as when all observations were available. For larger missing fractions, low-order time series models can still be estimated with a good accuracy if the total observation time is long enough. Autoregressive models are best estimated with the maximum likelihood method if data are missing. Maximum likelihood estimates of moving average and of autoregressive moving average models are not very useful with missing data. Those models are found most accurately if they are derived from the estimated parameters of an intermediate autoregressive model. With statistical criteria for the selection of model order and model type, a completely automatic and numerically reliable algorithm is developed that estimates the spectrum and the autocorrelation function in randomly missing data problems. The accuracy was better than what can be obtained with other methods, including the famous expectation–maximization (EM) algorithm.  相似文献   

13.
数据仓库技术是近几年来出现并迅速发展的一种技术,它可以充分利用数据仓库中已存储的信息,帮助决策者进行决策。本文在介绍数据仓库的定义和数据仓库决策支持系统原理的基础上,详尽地讨论了数据仓库中的数据抽取与转换方法,并对数据仓库实现过程进行了实例探索和在线分析,最后根据数据挖掘方法进行了实际的应用。本文就是采用微软SQL Server 2000数据仓库技术设计了一商业用的数据仓库,使读者可清楚了解数据仓库的决策支持系统的实现过程。  相似文献   

14.
The threat of cyber attacks motivates the need to monitor Internet traffic data for potentially abnormal behavior. Due to the enormous volumes of such data, statistical process monitoring tools, such as those traditionally used on data in the product manufacturing arena, are inadequate. “Exotic” data may indicate a potential attack; detecting such data requires a characterization of “typical” data. We devise some new graphical displays, including a “skyline plot,” that permit ready visual identification of unusual Internet traffic patterns in “streaming” data, and use appropriate statistical measures to help identify potential cyberattacks. These methods are illustrated on a moderate-sized data set (135,605 records) collected at George Mason University.  相似文献   

15.
针对化工过程间安全分析问题,结合计算机领域中数据依赖技术,提出一种新的应用于化工过程的安全分析解决方案。以双容水槽液位控制系统为实例,分析工艺流程和变量之间的关系,从中提取9个状态,10个迁移过程以及迁移的条件、事件及执行过程等信息,建立其扩展有限状态机模型。通过考察迁移T8中L2变量,分析其数据依赖关系路径,确定数据依赖正负影响关系,实现基于数据依赖的化工过程安全分析新方法,并通过对T4中L2变量的分析验证了所提方法的有效性,使得扩展有限状态机数据依赖技术成为计算机自动推理来实现化工过程的安全分析的一种新的有效方法。  相似文献   

16.
Two approaches are presented to perform principal component analysis (PCA) on data which contain both outlying cases and missing elements. At first an eigendecomposition of a covariance matrix which can deal with such data is proposed, but this approach is not fit for data where the number of variables exceeds the number of cases. Alternatively, an expectation robust (ER) algorithm is proposed so as to adapt the existing methodology for robust PCA to data containing missing elements. According to an extensive simulation study, the ER approach performs well for all data sizes concerned. Using simulations and an example, it is shown that by virtue of the ER algorithm, the properties of the existing methods for robust PCA carry through to data with missing elements.  相似文献   

17.
铝电解生产过程中计算机控制系统实时采集并存储大量的生产数据,这些数据包含着丰富的、难为人们通常所能发现的生产际过程中各种因素之间相互影响、相互作用的宝贵信息,提取和利用这些信息对于人们加强对生产过程的认识、提高控制和管理水平具有重要的意义。本文中建立了存储铝电解生产过程数据的数据仓库,应用OLAP对生产数据进行了分析处理,通过数据透视图给出对效应系数、电流效率、槽电压、原铝出铝量等的分析结果,并主要以效应系数为例给出通过OLAP快速直观地查询和浏览生产过程的历史数据、提供进行生产管理决策的依据。  相似文献   

18.
The fundamental goal of the GeoVISTA Studio project is to improve geoscientific analysis by providing an environment that operationally integrates a wide range of analysis activities, including those both computationally and visually based. Improving the infrastructure used in analysis has far-reaching potential to better integrate human-based and computationally based expertise, and so ultimately improve scientific outcomes. To address these challenges, some difficult system design and software engineering problems must be overcome.This paper illustrates the design of a component-oriented system, GeoVISTA Studio, as a means to overcome such difficulties by using state-of-the-art component-based software engineering techniques. Advantages described include: ease of program construction (visual programming), an open (non-proprietary) architecture, simple component-based integration and advanced deployment methods. This versatility has the potential to change the nature of systems development for the geosciences, providing better mechanisms to coordinate complex functionality, and as a consequence, to improve analysis by closer integration of software tools and better engagement of the human expert. Two example applications are presented to illustrate the potential of the Studio environment for exploring and better understanding large, complex geographical datasets and for supporting complex visual and computational analysis.  相似文献   

19.
基于虚拟仪器的多通道数据分析系统设计   总被引:1,自引:0,他引:1  
针对采用引线式测试法进行武器系统参数测试时存在的布线困难、干扰大的问题,提出了基于NandFlash技术的存储式测试法,并以LabVIEW为平台设计开发了用于数据后期分析处理的多通道数据分析系统,该系统由多通道波形显示、波形参数测量、滤波处理和频谱分析、波形打印等模块组成.实验结果表明,该系统能够高效、低误差地完成测试...  相似文献   

20.
The problem of strict convexity preserving interpolation by a piecewise quadratic smooth function is solved. A constructive procedure is provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号