首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Data cubes capture general trends aggregated from multidimensional data from a categorical relation. When provided with two relations, interesting knowledge can be exhibited by comparing the two underlying data cubes. Trend reversals or particular phenomena irrelevant in one data cube may indeed clearly appear in the other data cube. In order to capture such trend reversals, we have proposed the concept of Emerging Cube. In this article, we emphasize on two new approaches for computing Emerging Cubes. Both are devised to be integrated within standard Olap systems, since they do not require any additional nor complex data structures. Our first approach is based on Sql. We propose three queries with different aims. The most efficient query uses a particular data structure merging the two input relations to achieve a single data cube computation. This query works fine even when voluminous data are processed. Our second approach is algorithmic and aims to improve efficiency and scalability while preserving integration capability. The E-Idea algorithm works a´ laBuc and takes the specific features of Emerging Cubes into account. E-Idea is automaton-based and adapts its behavior to the current execution context. Our proposals are validated by various experiments where we measure query response time. Comparative experiments show that E-Idea’s response time is proportional to the size of the Emerging Cube. Experiments also demonstrate that extracting Emerging Cubes can be computed in practice, in a time compatible with user expectations.  相似文献   

2.
Extended Fibonacci Cubes   总被引:1,自引:0,他引:1  
The Fibonacci Cube is an interconnection network that possesses many desirable properties that are important in network design and application. The Fibonacci Cube can efficiently emulate many hypercube algorithms and uses fewer links than the comparable hypercube, while its size does not increase as fast as the hypercube. However, most Fibonacci Cubes (more than 2/3 of all) are not Hamiltonian. In this paper, we propose an Extended Fibonacci Cube (EFC1) with an even number of nodes. It is defined based on the same sequence F(i)=F(i-1)+F(i-2) as the regular Fibonacci sequence; however, its initial conditions are different. We show that the Extended Fibonacci Cube includes the Fibonacci Cube as a subgraph and maintains its sparsity property. In addition, it is Hamiltonian and is better in emulating other topologies. Specifically, the Extended Fibonacci Cube can embed binary trees more efficiently than the regular Fibonacci Cube and is almost as efficient as the hypercube, even though the Extended Fibonacci Cube is a much sparser network than the hypercube. We also propose a series of Extended Fibonacci Cubes with even number of nodes. Any Extended Fibonacci Cube (EFCk, with k⩾) in the series contains the node set of any other cube that precedes EFCk in the series. We show that any Extended Fibonacci Cube maintains virtually all the desirable properties of the Fibonacci Cube. The EFCks can be considered as flexible versions of incomplete hypercubes, which eliminates their restriction on the number of nodes, and, thus, makes it possible to construct parallel machines with arbitrary sizes  相似文献   

3.
在数据仓库的设计中实物化视图的选择有赖于对cube占用存储空间大小的预测,然而传统的基于数据均匀分布的抽样预测算法不能准确地估计cube的大小。文章介绍了一种利用抽样原理预测cube大小的算法,这种算法能够自适应不同程度的数据倾斜,特别适用于数据倾斜程度较大的情况。实验数据表明,该算法对传统的抽样预测算法有明显的改进效果。  相似文献   

4.
We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.  相似文献   

5.
In this article we present two novel enhancements for the cube pruning and cube growing algorithms, two of the most widely applied methods when using the hierarchical approach to statistical machine translation. Cube pruning is the de facto standard search algorithm for the hierarchical model. We propose to adapt concepts of the source cardinality synchronous search organization as used for standard phrase-based translation to the characteristics of cube pruning. In this way we will be able to improve the performance of the generation process and reduce the average translation time per sentence to approximately one quarter. We will also investigate the cube growing algorithm, a reformulation of cube pruning with on-demand computation. This algorithm depends on a heuristic for the language model, but this issue is barely discussed in the original work. We analyze the behaviour of this heuristic and propose a new one which greatly reduces memory consumption without costs in runtime or translation performance. Results are reported on the German–English Europarl corpus.  相似文献   

6.
A Genetic Selection Algorithm for OLAP Data Cubes   总被引:1,自引:0,他引:1  
Multidimensional data analysis, as supported by OLAP (online analytical processing) systems, requires the computation of many aggregate functions over a large volume of historically collected data. To decrease the query time and to provide various viewpoints for the analysts, these data are usually organized as a multidimensional data model, called data cubes. Each cell in a data cube corresponds to a unique set of values for the different dimensions and contains the metric of interest. The data cube selection problem is, given the set of user queries and a storage space constraint, to select a set of materialized cubes from the data cubes to minimize the query cost and/or the maintenance cost. This problem is known to be an NP-hard problem. In this study, we examined the application of genetic algorithms to the cube selection problem. We proposed a greedy-repaired genetic algorithm, called the genetic greedy method. According to our experiments, the solution obtained by our genetic greedy method is superior to that found using the traditional greedy method. That is, within the same storage constraint, the solution can greatly reduce the amount of query cost as well as the cube maintenance cost.  相似文献   

7.
数据立方体计算方法研究综述   总被引:2,自引:0,他引:2  
随着多维数据分析在各领域的广泛应用,基于数据立方体的计算方法受到大量研究者的关注.分析了影响 数据立方体计算的各种因素,其中包括数据存储空间、查询处理效率和数据立方体的维护消耗,并且阐述了数据立方体的物化策略.分别从冰山立方体、紧凑数据立方体、高维数据立方体、近似计算、流式数据立方体等几个方面综述了国内外现有的计算方法,分析了各种方法的特点以及适用范围.  相似文献   

8.
数据仓库中的维数据通常都是有层次的,基于维层次路径的聚簇能有效地在物理空间上将关联数据组织到一起,减少查询访问磁盘的次数。而现在的Cube存储结构都关注于Cube操作的计算和存储,忽视了这一特点。论文提出基于维层次聚簇的Cube存储结构HC(HierarchicallyClustered)Cube及相关算法,解决了目前存在的问题。  相似文献   

9.
Cube算子的计算在OLAP应用中起着极为重要的作用。本文分析了在高维Cube算子计算中传统流水线方法的不足之处,提出了通过有选择地实例化Cube中的部分节点以提高OLAP性能的解决方案,并给出了一个获取需要实例化节点的算法。  相似文献   

10.
通过把数据立方体中的维分为划分维和非划分维,视图中的数据被分成两部分,分别存储在关系和多维数组中。针对这种混合存储结构,我们设计了一个数据立方体生成算法,它结合了流水线聚集方法和多维数组聚集方法的优点,大大减少了流水线的条数和所需要的存储空间,加快了计算速度。并用一个实际数据集进行了实验,结果表明该算法适用于计算高维的数据立方体。  相似文献   

11.
The Marching Cubes Algorithm may return degenerate, zero area isosurface triangles, and often returns isosurface triangles with small areas, edges or angles. We show how to avoid both problems using an extended Marching Cubes lookup table. As opposed to the conventional Marching Cubes lookup table, the extended lookup table differentiates scalar values equal to the isovalue from scalar values greater than the isovalue. The lookup table has 38= 6561 entries, based on three possible labels, ‘?’ or ‘=’ or ‘+’, of each cube vertex. We present an algorithm based on this lookup table which returns an isosurface close to the Marching Cubes isosurface, but without any degenerate triangles or any small areas, edges or angles.  相似文献   

12.
前缀立方的索引   总被引:1,自引:0,他引:1  
前缀立方是最近提出的一种新的数据立方结构.它利用前缀共享和基本单元组有效地缩小了数据立方的尺寸,相应减少了数据立方的计算时间.为提高前缀立方的查询性能,本文提出了它的一种索引机制Prefix-CuboidTree.文中用真实数据集和模拟数据集进行大量实验,证明了该索引机制的查询性能.  相似文献   

13.
印莹  赵宇海  张斌 《计算机科学》2005,32(11):88-90
数据立方计算是代价非常大的操作,并且被广泛研究。受空问的限制,存储一个完全实例化的数据立方是不可行的。最近提出的一种语义压缩数据立方一Dwarf,通过消除前缀冗余和后缀冗余把一个完全实例化的数据立方压缩存储到一个很小的空问。然而,当数据源发生变化时,它的更新过程是很复杂的。本文通过研究Dwarf在更新过程中汇总结点的变化特性,提出了一种基于Dwarf的新的增量更新算法,既能完全实例化数据立方又不需要重新计算,大大提高了数据立方的更新效率。实验进一步证明了该算法的效率和有效性,尤其适合数据仓库中的高维数据集。  相似文献   

14.
Microeconomic analysis using dominant relationship analysis   总被引:1,自引:1,他引:0  
The concept of dominance has recently attracted much interest in the context of skyline computation. Given an N-dimensional data set S, a point p is said to dominate q if p is better than q in at least one dimension and equal to or better than it in the remaining dimensions. In this article, we propose extending the concept of dominance for business analysis from a microeconomic perspective. More specifically, we propose a new form of analysis, called Dominant Relationship Analysis (DRA), which aims to provide insight into the dominant relationships between products and potential buyers. By analyzing such relationships, companies can position their products more effectively while remaining profitable. To support DRA, we propose a novel data cube called DADA (Data Cube for Dominant Relationship Analysis), which captures the dominant relationships between products and customers. Three types of queries called Dominant Relationship Queries (DRQs) are consequently proposed for analysis purposes: (1) Linear Optimization Queries (LOQ), (2) Subspace Analysis Queries (SAQ), and (3) Comparative Dominant Queries (CDQ). We designed efficient algorithms for computation, compression and incremental maintenance of DADA as well as for answering the DRQs using DADA. We conducted extensive experiments on various real and synthetic data sets to evaluate the technique of DADA and report results demonstrating the effectiveness and efficiency of DADA and its associated query-processing strategies.  相似文献   

15.
With the popularity of column-store databases, modern multi-core CPUs, and general-purpose computing on graphics processing units (GPGPUs), there will be radical changes in how processing is done in the online analytical processing (OLAP) and data warehousing fields. Cube computation is a core and time-consuming problem which has been researched extensively. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures and column storage. This paper presents a new parallel cube algorithm that takes advantage of multi-core architectures. We first propose a cache-conscious bottom-up computation (BUC) algorithm called CC-BUC that adopts an integrated bottom-up and breadth-first partitioning order. Each dimension is separately stored and processed. In processing each dimension, breadth-first data scanning and results outputting reduce memory I/O and enhance cache locality. Cache misses are limited in a dimension scope, and translation lookaside buffer (TLB) misses are reduced. Based on CC-BUC, we give a multi-core architecture-based cube algorithm called MC-Cubing. Multiple partitions are processed simultaneously and multiple threads undergo parallel execution inside each partition. MC-Cubing is consistent with multi-core architectures and high parallelism. The layout and associated algorithms take advantage of single instruction, multiple data (SIMD) instructions and thread-level parallelism (TLP). We implement and demonstrate the effectiveness of MC-Cubing on two multi-core architectures: multi-core CPUs and GPUs. Experimental results show that the MC-Cubing algorithm can speed up nearly six times faster than BUC in real datasets.  相似文献   

16.
On Embedding Hamiltonian Cycles in Crossed Cubes   总被引:1,自引:0,他引:1  
We study the embedding of Hamiltonian cycle in the Crossed Cube, which is a prominent variant of the classical hypercube, obtained by crossing some straight links of a hypercube, and has been attracting much research interest in literatures since its proposal. We will show that due to the loss of link-topology regularity, generating Hamiltonian cycles in a crossed cube is a more complicated procedure than in its original counterpart. The paper studies how the crossed links affect an otherwise succinct process to generate a host of well-structured Hamiltonian cycles traversing all nodes. The condition for generating these Hamiltonian cycles in a crossed cube is proposed. An algorithm is presented that works out a Hamiltonian cycle for a given link permutation. The useful properties revealed and the algorithm proposed in this paper can find their way when system designers evaluate a candidate network's competence and suitability, balancing regularity and other performance criteria, in choosing an interconnection network.  相似文献   

17.
多维数据立方(Cube)的计算对联机事务分析有着极为重要的作用。本文针对传统的多维数据Cube计算中的不足,提出了一种新的基于依赖树的多维数据Cube计算模式,并对该计算模式提出了优化算法。实验表明,新的算法提高效率一个数量级以上。  相似文献   

18.
Emerging applications face the need to store and analyze interconnected data. Graph cubes permit multi-dimensional analysis of graph datasets based on attribute values available at the nodes and edges of these graphs. Like the data cube that contains an exponential number of aggregations, the graph cube results in an exponential number of aggregate graph cuboids. As a result, they are very hard to analyze. In this work, we first propose intuitive measures based on the information entropy in order to evaluate the rich information contained in the graph cube. We then introduce an efficient algorithm that suggests portions of a precomputed graph cube based on these measures. The proposed algorithm exploits novel entropy bounds that we derive between different levels of aggregation in the graph cube. Per these bounds we are able to prune large parts of the graph cube, saving costly entropy calculations that would be otherwise required. We experimentally validate our techniques on real and synthetic datasets and demonstrate the pruning power and efficiency of our proposed techniques.  相似文献   

19.
High Performance OLAP and Data Mining on Parallel Computers   总被引:2,自引:0,他引:2  
On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time. Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.  相似文献   

20.
骆吉洲  李建中  赵锴 《软件学报》2006,17(8):1743-1752
Iceberg Cube操作是OLAP(on-line analysis processing)分析中的一种重要操作.数据压缩技术在有效减小数据仓库所需的数据空间和提高数据处理性能方面的作用越来越明显.在压缩的数据仓库上,如何快速、有效地计算Iceberg Cube是目前亟待解决的问题.简要介绍了数据仓库的压缩,然后给出了在压缩数据仓库中计算Iceberg Cube的算法.实验结果表明,该算法的性能优于先在压缩数据上计算Cube再检查having条件这种方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号