共查询到20条相似文献,搜索用时 0 毫秒
1.
Exploratory data analysis is a widely used technique to determine which factors have the most influence on data values in a multi-way table, or which cells in the table can be considered anomalous with respect to the other cells. In particular, median polish is a simple yet robust method to perform exploratory data analysis. Median polish is resistant to holes in the table (cells that have no values), but it may require many iterations through the data. This factor makes it difficult to apply median polish to large multidimensional tables, since the I/O requirements may be prohibitive. This paper describes a technique that uses median polish over an approximation of a datacube, easing the burden of I/O. The cube approximation is achieved by fitting log-linear models to the data. The results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data. 相似文献
2.
New methods for self-organising map visual analysis 总被引:2,自引:0,他引:2
Self-organising maps (SOMs) have been used effectively in the visualisation and analysis of multidimensional data, with applications in exploratory data analysis (EDA) and data mining. We present three new techniques for performing visual analysis of SOMs. The first is a computationally light contraction method, closely related to the SOMs training algorithm, designed to facilitate cluster and trajectory analysis. The second is an enhanced geometric interpolation method, related to multidimensional scaling, which forms a mapping from the input space onto the map. Finally, we propose the explicit representation of graphs like the SOMs induced Delaunay triangulation for topology preservation and cluster analysis. The new methods provide an enhanced interpretation of the information contained in an SOM, leading to a better understanding of the data distributions with which they are trained, as well as providing insight into the maps formation. 相似文献
3.
Land change modelers often create future maps using reference land use map. However, future land use maps may mislead decision-makers, who are often unaware of the sensitivity and the uncertainty in land use maps due to error in data. Since most metrics that communicate uncertainty require using reference land use data to calculate accuracy, the assessment of uncertainty becomes challenging when no reference land use map for future is available. This study aims to develop a new conceptual framework for sensitivity analysis and uncertainty assessment (FSAUA) which compares multiple maps under various data error scenarios. FSAUA performs sensitivity analyses in land use maps using a reference map and assess uncertainty in predicted maps. FSAUA was applied using three well-known land change models (ANN, CART and MARS) in Delhi, India. FSAUA was found to be a practical tool for communicating the uncertainty with end-users who develop reliable planning decisions. 相似文献
4.
Extending a global sensitivity analysis technique to models with correlated parameters 总被引:2,自引:0,他引:2
C. Xu 《Computational statistics & data analysis》2007,51(12):5579-5590
The identification and representation of uncertainty is recognized as an essential component in model applications. One important approach in the identification of uncertainty is sensitivity analysis. Sensitivity analysis evaluates how the variations in the model output can be apportioned to variations in model parameters. One of the most popular sensitivity analysis techniques is Fourier amplitude sensitivity test (FAST). The main mechanism of FAST is to assign each parameter with a distinct integer frequency (characteristic frequency) through a periodic sampling function. Then, for a specific parameter, the variance contribution can be singled out of the model output by the characteristic frequency based on a Fourier transformation. One limitation of FAST is that it can only be applied for models with independent parameters. However, in many cases, the parameters are correlated with one another. In this study, we propose to extend FAST to models with correlated parameters. The extension is based on the reordering of the independent sample in the traditional FAST. We apply the improved FAST to linear, nonlinear, nonmonotonic and real application models. The results show that the sensitivity indices derived by FAST are in a good agreement with those from the correlation ratio sensitivity method, which is a nonparametric method for models with correlated parameters. 相似文献
5.
Data Mining has evolved as a new discipline at the intersection of several existing areas, including Database Systems, Machine
Learning, Optimization, and Statistics. An important question is whether the field has matured to the point where it has originated
substantial new problems and techniques that distinguish it from its parent disciplines. In this paper, we discuss a class
of new problems and techniques that show great promise for exploratory mining, while synthesizing and generalizing ideas from
the parent disciplines. While the class of problems we discuss is broad, there is a common underlying objective—to look beyond
a single data-mining step (e.g., data summarization or model construction) and address the combined process of data selection
and transformation, parameter and algorithm selection, and model construction. The fundamental difficulty lies in the large
space of alternative choices at each step, and good solutions must provide a natural framework for managing this complexity.
We regard this as a grand challenge for Data Mining, and see the ideas discussed here as promising initial steps towards a
rigorous exploratory framework that supports the entire process.
Bee-Chung Chen is supported by a Microsoft Research graduate fellowship. 相似文献
6.
William A. Gale 《Annals of Mathematics and Artificial Intelligence》1990,2(1-4):149-163
Classical expert systems are rule based, depending on predicates expressed over attributes and their values. In the process of building expert systems, the attributes and constants used to interpret their values need to be specified. Standard techniques for doing this are drawn from psychology, for instance, interviewing and protocol analysis. This paper describes a statistical approach to deriving interpreting constants for given attributes. It is also possible to suggest the need for attributes beyond those given.The approach for selecting an interpreting constant is demonstrated by an example. The data to be fitted are first generated by selecting a representative collection of instances of the narrow decision addressed by a rule, then making a judgement for each instance, and defining an initial set of potentially explanatory attributes. A decision rule graph plots the judgements made against pairs of attributes. It reveals rules and key instances directly. It also shows when no rule is possible, thus suggesting the need for additional attributes. A study of a collection of seven rule based models shows that the attributes defined during the fitting process improved the fit of the final models to the judgements by twenty percent over models built with only the initial attributes. 相似文献
7.
H.Charles Romesburg 《Computers & Geosciences》1985,11(1):19-37
In the context of randomization tests, this paper discusses the roles of exploratory data analysis (EDA) and confirmatory data analysis (CDA) in geoscience research. It shows: (1) how the classical methods of statistical inference can be used in EDA with nonrandom samples of data, and (2) how much of the knowledge in the geosciences is derived from EDA. The paper gives a FORTRAN IV computer program, CLASSTEST, that performs a randomization test for a multivariate analysis of variance (MANOVA) design. CLASSTEST will be useful in goescience research apart from its use in illustrating EDA and CDA. 相似文献
8.
The flow characteristics in open channel junctions are of great interest in hydraulic and environmental engineering areas. This study investigates the capacity of artificial neural network (ANN) models for representing and modelling the velocity distributions of combined open channel flows. ANN models are constructed and tested using data derived from computational-fluid-dynamics models. The orthogonal sampling method is used to select representative data. The ANN models trained and validated by representative data generally outperform those by using random data. Sobols' sensitivity analysis is performed to investigate contributions of different uncertainty sources to model performance. Results indicate that the major uncertainty source is from ANN model parameter initialization. Hence an ANN model training strategy is designed in order to reduce the main uncertainty source: models are trained for many runs with random model parameter initializations and the model with the best performance is adopted. 相似文献
9.
H.-R. Bae R. V. Grandhi R. A. Canfield 《Structural and Multidisciplinary Optimization》2006,31(4):270-279
Sensitivity analysis for the quantified uncertainty in evidence theory is developed. In reliability quantification, classical
probabilistic analysis has been a popular approach in many engineering disciplines. However, when we cannot obtain sufficient
data to construct probability distributions in a large-complex system, the classical probability methodology may not be appropriate
to quantify the uncertainty. Evidence theory, also called Dempster–Shafer Theory, has the potential to quantify aleatory (random)
and epistemic (subjective) uncertainties because it can directly handle insufficient data and incomplete knowledge situations.
In this paper, interval information is assumed for the best representation of imprecise information, and the sensitivity analysis
of plausibility in evidence theory is analytically derived with respect to expert opinions and structural parameters. The
results from the sensitivity analysis are expected to be very useful in finding the major contributors for quantified uncertainty
and also in redesigning the structural system for risk minimization. 相似文献
10.
Linear regression is often used to analyse and summarize data, and to uncover, clarify, and simplify a data structure. The outcome of these activities depends on both the analyst's domain-specific knowledge and on the data. Analysing the data also affects the analyst's understanding about the data and, hence, the act of analysing data is inherently a recursive activity, with each new iteration potentially providing additional insights. This process calls for a strategy of exploratory data analysis that consists of techniques for flexibly analysing, summarizing, and re-expressing the data. 相似文献
11.
基于数据耕耘的探索性仿真分析框架设计 总被引:1,自引:1,他引:1
基于数据耕耘的相关理论,设计了一个基于数据耕耘的探索性仿真分析框架,可以作为探索性仿真分析的支撑系统.探索性仿真分析框架是由探索空间定义,多想定运行,仿真结果分析等三个模块构成.其中探索空间定义主要完成待探索参数空间的设置,多想定运行主要完成探索空间运行计算任务,仿真结果分析主要完成对探索计算结果进行综合分析,为分析人员提供决策支持. 相似文献
12.
Two exploratory data analysis techniques the comap and the quad plot are shown to have both strengths and shortcomings when analysing spatial multivariate datasets. A hybrid of these two techniques is proposed: the quad map which is shown to overcome the outlined shortcomings when applied to a dataset containing weather information for disaggregate incidents of urban fires. Common to the quad plot, the quad map uses Polya models in order to articulate the underlying assumptions behind histograms. The Polya model formalises the situation in which past fire incident counts are computed and displayed in (multidimensional) histograms as appropriate assessments of conditional probability providing valuable diagnostics such as posterior variance i.e. sensitivity to new information. Finally we discuss how new technology in particular Online Analytics Processing (OLAP) and Geographical Information Systems (GISs) offer potential in automating exploratory spatial data analyses techniques, such as the quad map. 相似文献
13.
介绍一种解决复杂系统仿真可信性问题的模型确认方法.该方法首先将复杂系统划分成相对简单的子系统、基准系统、单元,得到一分层模型树;接下来对模型树中的模型进行排序并安排确认试验;然后利用信息差方法对拥有试验数据的子层模型进行单层确认;最后通过灵敏度分析将子层模型的确认结果传播到父层模型,最后得到全系统模型的确认结果.文中提出的方法适用于试验数据少、可分层的复杂系统. 相似文献
14.
Process-based models of fluid flow and heat transport in fluvial systems can be used to quantify unknown spatial and temporal patterns of hydrologic fluxes and to predict system response to change. In this study, a deterministic stream heat budget model, the HFLUX Stream Temperature Solver (HFLUX), is developed and evaluated using field studies. Field studies are conducted across two sites with different streamflow rates (0.07 vs 1.4 m3/s), and point sources versus diffuse sources of groundwater discharge, to demonstrate model transferability. A winter versus summer comparison at one site suggests latent heat flux should be derived using energy-based methods in summer and mass transfer approaches during winter. For each field study, HFLUX successfully modeled stream temperatures through space and time with normalized root mean square errors of 3.0–6.2%. Model calibration to observed temperature data in order to quantify groundwater contributions and a sensitivity analysis are demonstrated using HFLUX. 相似文献
15.
The widespread deployment of technologies with tracking capabilities, like GPS, GSM, RFID and on-line social networks, allows mass collection of spatio-temporal data about their users. As a consequence, several methods aimed at anonymizing spatio-temporal data before their publication have been proposed in recent years. Such methods are based on a number of underlying privacy models. Among these models, (k,δ)-anonymity claims to extend the widely used k -anonymity concept by exploiting the spatial uncertainty δ≥0 in the trajectory recording process. In this paper, we prove that, for any δ>0 (that is, whenever there is actual uncertainty), (k,δ)-anonymity does not offer trajectory k-anonymity, that is, it does not hide an original trajectory in a set of k indistinguishable anonymized trajectories. Hence, the methods based on (k,δ)-anonymity, like Never Walk Alone (NWA) and Wait For Me (W4M) can offer trajectory k -anonymity only when δ=0 (no uncertainty). Thus, the idea of exploiting the recording uncertainty δ to achieve trajectory k -anonymity with information loss inversely proportional to δ turns out to be flawed. 相似文献
16.
Chonggang Xu 《Computational statistics & data analysis》2011,55(1):184-198
Fourier Amplitude Sensitivity Test (FAST) is one of the most popular uncertainty and sensitivity analysis techniques. It uses a periodic sampling approach and a Fourier transformation to decompose the variance of a model output into partial variances contributed by different model parameters. Until now, the FAST analysis is mainly confined to the estimation of partial variances contributed by the main effects of model parameters, but does not allow for those contributed by specific interactions among parameters. In this paper, we theoretically show that FAST analysis can be used to estimate partial variances contributed by both main effects and interaction effects of model parameters using different sampling approaches (i.e., traditional search-curve based sampling, simple random sampling and random balance design sampling). We also analytically calculate the potential errors and biases in the estimation of partial variances. Hypothesis tests are constructed to reduce the effect of sampling errors on the estimation of partial variances. Our results show that compared to simple random sampling and random balance design sampling, sensitivity indices (ratios of partial variances to variance of a specific model output) estimated by search-curve based sampling generally have higher precision but larger underestimations. Compared to simple random sampling, random balance design sampling generally provides higher estimation precision for partial variances contributed by the main effects of parameters. The theoretical derivation of partial variances contributed by higher-order interactions and the calculation of their corresponding estimation errors in different sampling schemes can help us better understand the FAST method and provide a fundamental basis for FAST applications and further improvements. 相似文献
17.
集对分析理论及其应用研究进展 总被引:15,自引:1,他引:14
集对分析理论是一种较新的软计算方法,可有效地分析和处理不确定信息。近年来,该理论日益受到学术界的重视,已经在决策、预测、数据融合、不确定性推理、产品设计、网络计划、综合评价等领域得到较为成功的应用。本文简要介绍了集对分析理论的基本概念和理论基础,较详细地论述了该理论的最新研究成果与应用进展情况,最后指出可能的发展趋势和研究方向。 相似文献
18.
P. Baraldi M. Librizzi E. Zio L. Podofillini V.N. Dang 《Expert systems with applications》2009,36(10):12461-12471
Problems characterized by qualitative uncertainty described by expert judgments can be addressed by the fuzzy logic modeling paradigm, structured within a so-called fuzzy expert system (FES) to handle and propagate the qualitative, linguistic assessments by the experts. Once constructed, the FES model should be verified to make sure that it represents correctly the experts’ knowledge. For FES verification, typically there is not enough data to support and compare directly the expert- and FES-inferred solutions. Thus, there is the necessity to develop indirect methods for determining whether the expert system model provides a proper representation of the expert knowledge. A possible way to proceed is to examine the importance of the different input factors in determining the output of the FES model and to verify whether it is in agreement with the expert conceptualization of the model. In this view, two sensitivity and uncertainty analysis techniques applicable to generic FES models are proposed in this paper with the objective of providing appropriate tools of verification in support of the experts in the FES design phase. To analyze the insights gained by using the proposed techniques, a case study concerning a FES developed in the field of human reliability analysis has been considered. 相似文献
19.
根据目前数据挖掘研究的现状,分析不确定数据的聚类挖掘算法。针对不确定数据聚类挖掘存在的问题,提出改进传统的数据挖掘算法来适合不确定数据的聚类挖掘或找出新的聚类挖掘算法,来解决不确定数据聚类挖掘问题的新思路。 相似文献
20.
Application of fuzzy numerical techniques for product performance analysis in the conceptual and preliminary design stage 总被引:1,自引:0,他引:1
Uncertainty and variability modelling tools greatly enhance the value of virtual prototypes at the different design stages of a CAE process. The fuzzy analysis technique is suited to deal with models containing subjective non-deterministic parameters. This technique is finding its way to different disciplines of mechanical engineering. The objective of this paper is to increase the value of this technique in early stages of mechanical design procedures. For this purpose, new numerical procedures are proposed. First, the degree of influence is introduced. This new concept measures the relative effect of highly uncertain design properties on the performance of a design. Next, this paper proposes a new reduced optimisation scheme in order to improve the computational efficiency of the interval analysis, which is at the core of the implementation of the fuzzy technique. The practical applicability of the newly developed procedures is demonstrated on two numerical applications from the automotive industry. The analysed models represent the design at the conceptual stage, and contain parameters with a high and subjective level of uncertainty. The parametrised models are used to demonstrate the value and efficiency of the developed numerical procedures: significant parameters are identified using the degree of influence analysis, the optimal configuration is identified through an interval analysis based on the reduced optimisation scheme, and finally the fuzzy technique is applied as design space exploration tool. 相似文献