共查询到10条相似文献,搜索用时 62 毫秒
1.
Semi-Closed Cube: An Effective Approach to Trading Off Data Cube Size and Query Response Time 下载免费PDF全文
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube. 相似文献
2.
Hongjun Lu Jeffrey Xu Yu Ling Feng Zhixian Li 《Distributed and Parallel Databases》2003,13(2):181-202
Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory. 相似文献
3.
Star Cube--一种高效的数据立方体实现方法 总被引:1,自引:2,他引:1
一个具有n个维的数据立方体有2^n个视图,视图越多,用于维护数据立方体的时间也就越长。通过将维分成划分维和非划分维,数据立方体可以转换成star cube.stal cube由一个综合表和那些仅包含划分维的视图组成。star cube使用前缀共享和元组共享技术不仅减少了所需的存储空间,还大大减少了计算和维护时间。在把一个分片限制在一个I/O单位的条件下,star cube的查询响应时间与数据立方体基本相同。实验结果也表明,star cube是一种在时空两方面均有效的数据立方体实现技术。 相似文献
4.
Cube算子的计算在OLAP应用中起着极为重要的作用。本文分析了在高维Cube算子计算中传统流水线方法的不足之处,提出了通过有选择地实例化Cube中的部分节点以提高OLAP性能的解决方案,并给出了一个获取需要实例化节点的算法。 相似文献
5.
MapReduce环境下的并行Dwarf立方构建 总被引:1,自引:0,他引:1
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合... 相似文献
6.
7.
8.
数据仓库及OLAP技术是当前数据库领域研究的热点,而数据模型又是数据仓库及OLAP核心基础。文章提出了一种应用于OLAP的数据模型,并用于实际应用中。这种数据模型在概念上表达了OLAP特性,支持OLAP操作,而且其数学代数简单明白地表达了OLAP查询。 相似文献
9.
Parallelizing the Data Cube 总被引:1,自引:0,他引:1
Frank Dehne Todd Eavis Susanne Hambrusch Andrew Rau-Chaplin 《Distributed and Parallel Databases》2002,11(2):181-201
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p. 相似文献