首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.  相似文献   

2.
Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.  相似文献   

3.
Star Cube--一种高效的数据立方体实现方法   总被引:1,自引:2,他引:1  
一个具有n个维的数据立方体有2^n个视图,视图越多,用于维护数据立方体的时间也就越长。通过将维分成划分维和非划分维,数据立方体可以转换成star cube.stal cube由一个综合表和那些仅包含划分维的视图组成。star cube使用前缀共享和元组共享技术不仅减少了所需的存储空间,还大大减少了计算和维护时间。在把一个分片限制在一个I/O单位的条件下,star cube的查询响应时间与数据立方体基本相同。实验结果也表明,star cube是一种在时空两方面均有效的数据立方体实现技术。  相似文献   

4.
Cube算子的计算在OLAP应用中起着极为重要的作用。本文分析了在高维Cube算子计算中传统流水线方法的不足之处,提出了通过有选择地实例化Cube中的部分节点以提高OLAP性能的解决方案,并给出了一个获取需要实例化节点的算法。  相似文献   

5.
MapReduce环境下的并行Dwarf立方构建   总被引:1,自引:0,他引:1  
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合...  相似文献   

6.
研究了基于空间数据仓库的一种决策分析工具——空间在线分析处理(OLAP)的支撑技术。将普通数据立方体与空间数据立方体进行比较,提出空间数据立方体的维和度量的建模方法,解决了空间维与非空间维、空间度量与数值度量的集成建模问题。  相似文献   

7.
联机分析处理和数据挖掘是两种重要的数据分析方法。使用数据立方体作为数据存储结构,将两者集成起来,使得用户可以从不同角度、不同抽象层次分析数据。针对数据立方体的特点,本文提出了挖掘维间关联规则的算法,并编程实现了该算法,取得满意的结果。  相似文献   

8.
许睿  刘文才 《计算机工程与应用》2002,38(21):210-211,215
数据仓库及OLAP技术是当前数据库领域研究的热点,而数据模型又是数据仓库及OLAP核心基础。文章提出了一种应用于OLAP的数据模型,并用于实际应用中。这种数据模型在概念上表达了OLAP特性,支持OLAP操作,而且其数学代数简单明白地表达了OLAP查询。  相似文献   

9.
Parallelizing the Data Cube   总被引:1,自引:0,他引:1  
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p.  相似文献   

10.
基于数据立方体的数据仓库安全控制   总被引:1,自引:0,他引:1       下载免费PDF全文
周海晴  陈启买  刘海 《计算机工程》2010,36(10):152-154
针对数据仓库与在线分析处理(OLAP)系统存在的数据仓库非法访问和敏感信息间接推理问题,在原有统计数据库安全体系架构的基础上,构建OLAP的3层安全控制体系架构,并结合该架构提出一种新的基于数据立方体的推理控制方法。该方法先预防m维推理,然后清除一维推理,简化了m维推理的检测过程。  相似文献   

11.
OLAP核心技术—数据立方体的研究现状与展望   总被引:6,自引:0,他引:6  
该文从OLAP的基本功能出发,综述了其核心技术-数据立方体(Data Cube)的研究现状,主要讨论了3个方面:数据立方体建模,数据立方体计算和数据立方体操作,最后展望了该领域的研究方向。  相似文献   

12.
GSFC--基于图结构的Free Cube存储方法   总被引:2,自引:0,他引:2  
free cube利用发掘基本关系表维值之间的蕴含规则,去除data cube中内在冗余,有效减小data cube体积.但是还存在一些值得进一步研究的问题.首先,直接地表示free cube仍然不够精简从而浪费了存储空间.其次,只提到了查询的基本思想,没有给出具体的查询技术.针对这些问题,提出了基于图结构的存储方法GSFC,利用前缀压缩进一步减小free cube体积.同时,该方法结合了存储和索引结构,有效解决free cube的查询问题.最后给出了计算和查询算法,并利用实验来证明算法的有效性.  相似文献   

13.
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.  相似文献   

14.
基于数字立方体的复杂查询是立方体技术的发展方向。该文针对复杂立方体查询中可能存在的3种聚集依赖,分别给出3种基于Caching重用技术的解决方法。在模拟数据集和真实数据集上的实验结果验证了该方法的有效性和正确性。  相似文献   

15.
网络教学评价是网络教学的一个重要环节。提出了基于数据立方体的网络教学评价模型,以学生、教师为评价对象.构建相应的数据立方体。学生学习行为数据立方体及教师教学行为数据立方体均由六个维度构成,以访问时间作为主要的度量值。并讨论了模型中涉及的关键技术。  相似文献   

16.
网络教学评价是网络教学的一个重要环节。提出了基于数据立方体的网络教学评价模型,以学生、教师为评价对象,构建相应的数据立方体。学生学习行为数据立方体及教师教学行为数据立方体均由六个维度构成,以访问时间作为主要的度量值。并讨论了模型中涉及的关键技术。  相似文献   

17.
冯玉才  刘玉葆  冯剑琳 《软件学报》2003,14(10):1706-1716
约束立方梯度挖掘是一项重要的挖掘任务,其主要目的是从数据立方中挖掘出满足梯度约束的梯度-探测元组对.然而,现有的研究都是基于一般数据立方的.研究了浓缩数据立方中约束数据立方梯度的挖掘问题.通过扩展LiveSet驱动算法,提出了一个eLiveSet算法.测试表明,该算法在立方梯度挖掘效率上比现有算法要高.  相似文献   

18.
基于图形处理器的并行方体计算   总被引:1,自引:0,他引:1  
方体(cube)计算是数据仓库和联机分析处理(Online analytical processing,OLAP)领域的核心问题,如何提高方体计算性能获得了学术界和工业界的广泛关注,但目前大部分方体算法都没有考虑最新的处理器架构.近年来,处理器从单一计算核心进化为多个或许多个计算核心,如多核CPU、图形处理器(Graphic Processing Units, GPU)等.为了充分利用现代处理器的多核资源,该文提出了基于GPU的并行方体算法GPU-Cubing,算法采用自底向上、广度优先的划分策略,每次并行完成一个cuboid的计算并输出;在计算cuboid过程中多个分区同步处理,分区内多线程并行.GPU-Cubing算法适合GPU体系结构,并行度高.与BUC算法相比,基于真实数据集的完全方体计算可以获得一个数量级以上的加速比,冰山方体获得至少2倍以上的加速.  相似文献   

19.
基于数据立方体的静态推理控制方法在联机分析处理(OLAP)系统中的访问有效性不高。为此,提出一种基于数据立方体的动态推理控制方法。该方法以提高OLAP系统访问有效性为目的,实时处理在线查询,分析查询方体的推理威胁,阻止其推理产生,并动态返回可访问方体集。实验结果表明,该方法可提高推理系统的有效性,且与静态推理控制方法有相同的安全性。  相似文献   

20.
提出一种新的浓缩商覆盖立方体的数据立方体压缩技术,在商覆盖立方体中省略了部分只依据基本表即可快速应答查询的基本单元组,从而缩小其体积。给出浓缩商覆盖立方体的生成算法和查询算法。实验结果表明,浓缩商覆盖立方体的元组数量仅为原商覆盖立方体的62%,验证了浓缩商覆盖立方体技术的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号