共查询到20条相似文献,搜索用时 15 毫秒
1.
With the era of data explosion coming, multidimensional visualization, as one of the most helpful data analysis technologies, is more frequently applied to the tasks of multidimensional data analysis. Correlation analysis is an efficient technique to reveal the complex relationships existing among the dimensions in multidimensional data. However, for the multidimensional data with complex dimension features,traditional correlation analysis methods are inaccurate and limited. In this paper, we introduce the improved Pearson correlation coefficient and mutual information correlation analysis respectively to detect the dimensions’ linear and non-linear correlations. For the linear case,all dimensions are classified into three groups according to their distributions. Then we correspondingly select the appropriate parameters for each group of dimensions to calculate their correlations. For the non-linear case,we cluster the data within each dimension. Then their probability distributions are calculated to analyze the dimensions’ correlations and dependencies based on the mutual information correlation analysis. Finally,we use the relationships between dimensions as the criteria for interactive ordering of axes in parallel coordinate displays. 相似文献
2.
A new multidimensional data structure, multidimensional tree (MD-tree), is proposed. The MD-tree is developed by extending the concept of the B-tree to the multidimensional data, so that the MD-tree is a height balanced tree similar to the B-tree. The theoretical worst-case storage utilization is guaranteed to hold more than 66.7% (2/3) of full capacity. The structure of the MD-tree and the algorithms to perform the insertion, deletion, and spatial searching are described. By the series of simulation tests, the performances of the MD-tree and conventional methods are compared. The results indicate that storage utilization is more than 80% in practice, and that retrieval performance and dynamic characteristics are superior to conventional methods 相似文献
3.
Innovations in Systems and Software Engineering - Materialized views are heavily used to speed up the query response time of any data centric application. In the literature, the construction and... 相似文献
4.
Correlation analysis is regarded as a significant challenge in the mining of multidimensional data streams. Great emphasis is generally placed on one-dimensional data streams with the existing correlation analysis methods for the mining of data streams. Therefore, the identification of underlying correlation among multivariate arrays (e.g. Sensor data) has long been ignored. The technique of canonical correlation analysis (CCA) has rarely been applied in multidimensional data streams. In this study, a novel correlation analysis algorithm based on CCA, called ApproxCCA, is proposed to explore the correlations between two multidimensional data streams in the environment with limited resources. By introducing techniques of unequal probability sampling and low-rank approximation to reduce the dimensionality of the product matrix composed by the sample covariance matrix and sample variance matrix, ApproxCCA successfully improves computational efficiency while ensuring the analytical precision. Experimental results of synthetic and real data sets have indicated that the computational bottleneck of traditional CCA can be overcome with ApproxCCA, and the correlations between two multidimensional data streams can also be detected accurately. 相似文献
5.
The most effective technique to enhance performances of multidimensional databases consists in materializing redundant aggregates called views. In the classical approach to materialization, each view includes all and only the measures of the cube it aggregates. In this paper we investigate the benefits of materializing views in vertical fragments, aimed at minimizing the workload response time. We formalize the fragmentation problem as a 0–1 integer linear programming problem, which is then solved by means of a standard integer programming solver to determine the optimal fragmentation for a given workload. Finally, we demonstrate the usefulness of fragmentation by presenting a large set of experimental results based on the TPC-H benchmark. 相似文献
6.
Using mobile brokerage service as an example, we propose and test a multidimensional and hierarchical model of mobile service (m-service) quality using a sample of 338 respondents from the two largest m-service providers in China: China Mobile and China Unicom. Through three-stage validation, we are able to confirm all three levels of the proposed hierarchical structure where a customer’s perceived m-service quality includes primary dimensions of interaction, outcome, and environment qualities. Each primary dimension further has its sub-dimensions. Our empirical results also show that corporate image moderates the effects of environment and outcome qualities on the service quality. Our proposed model provides implications for future research on mobile commerce. 相似文献
7.
A new method for multi-dimensional distribution analysis using a data compression technique applied to the knowledge-based mean-force potentials between residues for the analysis of protein sequence-structure compatibility performs much better than that of conventional 1D distance-based potentials derived from binned distributions. 相似文献
8.
By introducing a form of reorder for multidimensional data, we propose a unified fast algo-rithm that jointly employs one-dimensional W transform and multidimensional discrete polynomial trans-form to compute eleven types of multidimensional discrete orthogonal transforms, which contain three types of m-dimensional discrete cosine transforms ( m-D DCTs) ,four types of m-dimensional discrete W transforms ( m-D DWTs) ( m-dimensional Hartley transform as a special case), and four types of generalized discrete Fourier transforms ( m-D GDFTs). For real input, the number of multiplications for all eleven types of the m-D discrete orthogonal transforms needed by the proposed algorithm are only 1/m times that of the commonly used corresponding row-column methods, and for complex input, it is further reduced to 1/(2m) times. The number of additions required is also reduced considerably. Furthermore, the proposed algorithm has a simple computational structure and is also easy to be im-plemented on computer, and th 相似文献
9.
An algorithm that fits a continuous function to sparse multidimensional data is presented. The algorithm uses a representation in terms of lower-dimensional component functions of coordinates defined in an automated way and also permits dimensionality reduction. Neural networks are used to construct the component functions. Program summaryProgram title: RS_HDMR_NN Catalogue identifier: AEEI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 19 566 No. of bytes in distributed program, including test data, etc.: 327 856 Distribution format: tar.gz Programming language: MatLab R2007b Computer: any computer running MatLab Operating system: Windows XP, Windows Vista, UNIX, Linux Classification: 4.9 External routines: Neural Network Toolbox Version 5.1 (R2007b) Nature of problem: Fitting a smooth, easily integratable and differentiatable, function to a very sparse (∼2-3 points per dimension) multidimensional ( D?6) large (∼ 104- 105 data) dataset. Solution method: A multivariate function is represented as a sum of a small number of terms each of which is a low-dimensional function of optimised coordinates. The optimal coordinates reduce both the dimensionality and the number of the terms. Neural networks (including exponential neurons) are used to obtain a general and robust method and a functional form which is easily differentiated and integrated (in the case of exponential neurons). Running time: Depends strongly on the dataset to be modelled and the chosen structure of the approximating function, ranges from about a minute for ∼ 103 data in 3- D to about a day for ∼ 105 data in 15- D. 相似文献
10.
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided. 相似文献
11.
Spatial database management involves two main categories of data: vector and raster data. The former has received a lot of in-depth investigation; the latter still lacks a sound framework. Current DBMSs either regard raster data as pure byte sequences where the DBMS has no knowledge about the underlying semantics, or they do not complement array structures with storage mechanisms suitable for huge arrays, or they are designed as specialized systems with sophisticated imaging functionality, but no general database capabilities (e.g., a query language). Many types of array data will require database support in the future, notably 2-D images, audio data and general signal-time series (1-D), animations (3-D), static or time-variant voxel fields (3-D and 4-D), and the ISO/IEC PIKS (Programmer's Imaging Kernel System) BasicImage type (5-D). In this article, we propose a comprehensive support of multidimensional discrete data (MDD) in databases, including operations on arrays of arbitrary size over arbitrary data types. A set of requirements is developed, a small set of language constructs is proposed (based on a formal algebraic semantics), and a novel MDD architecture is outlined to provide the basis for efficient MDD query evaluation. 相似文献
13.
The problem of computing the empirical cumulative distribution function (ECDF) of N points in k-dimensional space has been studied and motivated recently by Bentley [1], whose solution uses recursive multidimensional divide-and-conquer. In this paper, the problem is treated as a generalization of the problem of computing the inversion of a permutation. An algorithm of Knuth [3] is then extended to yield an O(kN(log 2N) k?1) solution to the ECDF problem, which is comparable to Bentley's solution. Neither solution approaches the O(kN log 2N) lower bound, and they are worse than the O(kN 2) ‘brute force’ algorithm for large k. The new algorithm, however, has the advantage of being highly parallel so that fast solution exists with parallel processors. 相似文献
14.
Existing models for cluster analysis typically consist of a number of attributes that describe the objects to be partitioned and one single latent variable that represents the clusters to be identified. When one analyzes data using such a model, one is looking for one way to cluster data that is jointly defined by all the attributes. In other words, one performs unidimensional clustering. This is not always appropriate. For complex data with many attributes, it is more reasonable to consider multidimensional clustering, i.e., to partition data along multiple dimensions. In this paper, we present a method for performing multidimensional clustering on categorical data and show its superiority over unidimensional clustering. 相似文献
15.
The paper presents a new radiosity algorithm that allows the simultaneous computation of energy exchanges between surface elements, scattering volume distributions, and groups of surfaces, or object clusters. The new technique is based on a hierarchical formulation of the zonal method, and efficiently integrates volumes and surfaces. In particular no initial linking stage is needed, even for inhomogeneous volumes, thanks to the construction of a global spatial hierarchy. An analogy between object clusters and scattering volumes results in a powerful clustering radiosity algorithm, with no initial linking between surfaces and fast computation of average visibility information through a cluster. We show that the accurate distribution of the energy emitted or received at the cluster level can produce even better results than isotropic clustering at a marginal cost. The resulting algorithm is fast and, more importantly, truly progressive as it allows the quick calculation of approximate solutions with a smooth convergence towards very accurate simulations 相似文献
17.
The rapidly increasing surveillance video data has challenged the existing video coding standards. Even though knowledge based video coding scheme has been proposed to remove redundancy of moving objects across multiple videos and achieved great coding efficiency improvement, it still has difficulties to cope with complicated visual changes of objects resulting from various factors. In this paper, a novel hierarchical knowledge extraction method is proposed. Common knowledge on three coarse-to-fine levels, namely category level, object level and video level, are extracted from history data to model the initial appearance, stable changes and temporal changes respectively for better object representation and redundancy removal. In addition, we apply the extracted hierarchical knowledge to surveillance video coding tasks and establish a hybrid prediction based coding framework. On the one hand, hierarchical knowledge is projected to the image plane to generate reference for I frames to achieve better prediction performance. On the other hand, we develop a transform based prediction for P/B frames to reduce the computational complexity while improve the coding efficiency. Experimental results demonstrate the effectiveness of our proposed method. 相似文献
18.
The task of identifying a hierarchical data structure is considered for the example of the problem of identifying personalizing
reference characteristics. A model of a neural network based on radial basis functions is proposed as a possible solution
of the task. The identification of the hierarchical dependence is practically aimed to create a classifier using a restricted
set of input variables compared to the flat structured classifier. A multilayer perceptron is used as local classifiers. We
also use self-organizing maps to visually show data structuredness. 相似文献
19.
In many real-world applications in the areas of data mining, the distributions of testing data are different from that of training data. And on the other hand, many data are often represented by multiple views which are of importance to learning. However, little work has been done for it. In this paper, we explored to leverage the multi-view information across different domains for knowledge transfer. We proposed a novel transfer learning model which integrates the domain distance and view consistency into a 2-view support vector machine framework, namely DV2S. The objective of DV2S is to find the optimal feature mapping such that under the projections the classification margin is maximized, while both the domain distance and the disagreement between multiple views are minimized simultaneously. Experiments showed that DV2S outperforms a variety of state-of-the-art algorithms. 相似文献
20.
Data Warehouses (DW), Multidimensional (MD) databases, and On-Line Analytical Processing (OLAP) applications provide companies with many years of historical information for the decision-making process. Owing to the relevant information managed by these systems, they should provide strong security and confidentiality measures from the early stages of a DW project in the MD modeling and enforce them. In the last years, there have been some proposals to accomplish the MD modeling at the conceptual level. Nevertheless, none of them considers security measures as an important element in their models, and therefore, they do not allow us to specify confidentiality constraints to be enforced by the applications that will use these MD models. In this paper, we present an Access Control and Audit (ACA) model for the conceptual MD modeling. Then, we extend the Unified Modeling Language (UML) with this ACA model, representing the security information (gathered in the ACA model) in the conceptual MD modeling, thereby allowing us to obtain secure MD models. Moreover, we use the OSCL (Object Security Constraint Language) to specify our ACA model constraints, avoiding in this way an arbitrary use of them. Furthermore, we align our approach with the Model-Driven Architecture, the Model-Driven Security and the Model-Driven Data Warehouse, offering a proposal highly compatible with the more recent technologies. 相似文献
|