首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

2.
超大型压缩数据仓库上的CUBE算法   总被引:7,自引:2,他引:7  
高宏  李建中 《软件学报》2001,12(6):830-839
数据压缩是提高多维数据仓库性能的重要途径,联机分析处理是数据仓库上的主要应用,Cube操作是联机分析处理中最常用的操作之一.压缩多维数据仓库上的Cube算法的研究是数据库界面临的具有挑战性的重要任务.近年来,人们在Cube算法方面开展了大量工作,但却很少涉及多维数据仓库和压缩多维数据仓库.到目前为止,只有一篇论文提出了一种压缩多维数据仓库上的Cube算法.在深入研究压缩数据仓库上的Cube算法的基础上,提出了产生优化Cube计算计划的启发式算法和3个压缩多维数据仓库上的Cube算法.所提出的Cube算法直  相似文献   

3.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

4.
提出了一种高效的挖掘数据仓库中多维关联规则的MDP算法。MDP算法通过构造一种扩展的前缀树MDP-tree,将数据仓库中的有效信息压缩存储,再使用基于MDP-tree的MDP-mining方法快速发现有趣的关联规则。MDP算法仅需要扫描一次数据仓库,就可以构造出MDP-tree,进而得到所有的关联规则。该算法还具有频繁模式查找简捷、二次查找迅速等优点。通过实验验证了MDP算法的高效性和稳定性,与传统的多维关联规则算法相比有更好的性能。  相似文献   

5.
MapReduce环境下的并行Dwarf立方构建   总被引:1,自引:0,他引:1  
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合...  相似文献   

6.
On-line analytical processing (OLAP) typically involves complex aggregate queries over large datasets. The data cube has been proposed as a structure that materializes the results of such queries in order to accelerate OLAP. A significant fraction of the related work has been on Relational-OLAP (ROLAP) techniques, which are based on relational technology. Existing ROLAP cubing solutions mainly focus on “flat” datasets, which do not include hierarchies in their dimensions. Nevertheless, as shown in this paper, the nature of hierarchies introduces several complications into the entire lifecycle of a data cube including the operations of construction, storage, indexing, query processing, and incremental maintenance. This fact renders existing techniques essentially inapplicable in a significant number of real-world applications and mandates revisiting the entire cube lifecycle under the new perspective. In order to overcome this problem, the CURE algorithm has been recently proposed as an efficient mechanism to construct complete cubes over large datasets with arbitrary hierarchies and store them in a highly compressed format, compatible with the relational model. In this paper, we study the remaining phases in the cube lifecycle and introduce query-processing and incremental-maintenance algorithms for CURE cubes. These are significantly different from earlier approaches, which have been proposed for flat cubes constructed by other techniques and are inadequate for CURE due to its high compression rate and the presence of hierarchies. Our methods address issues such as cube indexing, query optimization, and lazy update policies. Especially regarding updates, such lazy approaches are applied for the first time on cubes. We demonstrate the effectiveness of CURE in all phases of the cube lifecycle through experiments on both real-world and synthetic datasets. Among the experimental results, we distinguish those that have made CURE the first ROLAP technique to complete the construction and usage of the cube of the highest-density dataset in the APB-1 benchmark (12 GB). CURE was in fact quite efficient on this, showing great promise with respect to the potential of the technique overall.  相似文献   

7.
骆吉洲  李建中  赵锴 《软件学报》2006,17(8):1743-1752
Iceberg Cube操作是OLAP(on-line analysis processing)分析中的一种重要操作.数据压缩技术在有效减小数据仓库所需的数据空间和提高数据处理性能方面的作用越来越明显.在压缩的数据仓库上,如何快速、有效地计算Iceberg Cube是目前亟待解决的问题.简要介绍了数据仓库的压缩,然后给出了在压缩数据仓库中计算Iceberg Cube的算法.实验结果表明,该算法的性能优于先在压缩数据上计算Cube再检查having条件这种方法.  相似文献   

8.
一种并行处理多维连接和聚集操作的有效方法   总被引:1,自引:0,他引:1  
随着并行计算算法的完善和廉价、功能强大的多处理机系统的成熟,使得采用多处理机系统来并行处理多维数据仓库的连接和聚集操作成为当前有效提高OLAP查询处理性能的首选技术.为此,提出一种降低连接和聚集操作开销的并行算法PJAMDDC(parallel join and aggregation for multi-dimensional data cube).算法充分考虑了多维数据立方体的存储机制和多处理机分布系统的结构特点,在原有聚集计算多维数据立方体的搜索点阵逻辑结构的基础上,采用多维数据仓库的层次联合代理(hierarchy combined surrogate)和对立方体的搜索点阵进行加权的方法,使得立方体数据在多个处理机间的分配达到最佳的状态,从而在分割多维数据的同时,提高了并行处理多维连接和聚集操作的效率.算法实验评估表明,PJAMDDC算法并行处理多维数据仓库的连接和聚集操作是有效的.  相似文献   

9.
I/O parallelism is considered to be a promising approach to achieving high performance in parallel data warehousing systems where huge amounts of data and complex analytical queries have to be processed. This paper proposes a parallel secondary data cube storage structure (PHC for short) to efficiently support the processing of range sum queries and dynamic updates on data cube using parallel computing systems. Based on PHC, two parallel algorithms for processing range sum queries and updates are proposed also. Both the algorithms have the same time complexity, O(logdn/P). The analytical and experimental results show that PHC and the parallel algorithms have high performance and achieve optimum speedup.  相似文献   

10.
大型数据仓库实现技术的研究   总被引:2,自引:0,他引:2  
大型数据仓库是实现海量数据存储的有效途径,但在大型数据仓库的实现中存在很多问题。在分析问题的基础上,对大型数据仓库的实现问题提出了一定的解决策略,对其中的几个关键技术即数据立方体的有效计算、增量式更新维护、索引优化、故障恢复、模式设计和查询优化的代价模型及元数据的定义和管理等作了研究。  相似文献   

11.
数据仓库系统中一种改进的维层次聚集Cube存储结构   总被引:3,自引:0,他引:3  
提出利用Cube中的维层次(dimension hierarchy)聚集技术来创建高性能的维层次聚集Cube(dimension hierarchy aggregate cube, DHAC).充分利用DHAC已保存的维层次信息,对Cube中多维数据的查询和更新效率进行了优化,并且支持Cube的上探、下钻等语义操作.在DHAC中进行数据插入和删除等数据更新时,由下向上用更新前后的差值对受到更新结点影响的所有祖先结点进行增量更新.实现了在插入新维或维层次时不需要重新构建聚集Cube就可以实现Cube的模式更新.对维层次聚集Cube与传统Cube进行了算法性能分析和比较,理论分析和实验结果都表明,所提出的DHAC性能最佳.  相似文献   

12.
Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases.  相似文献   

13.
杨明奇  李占山  张家晨 《软件学报》2019,30(11):3355-3363
表约束是一种外延的知识表示方法,每个约束在对应的变量集上列举出所有支持或禁止的元组.广义弧相容(generalized arc consistency,简称GAC)是求解约束满足问题应用最广泛的相容性.Simple Tabular Reduction(STR)是一类高效的维持GAC的算法.在回溯搜索中,STR动态地删除无效元组,降低了查找支持的开销,并拥有单位时间的回溯代价,在高元表约束上获得了广泛运用,并有大量基于STR的改进算法被提出,其中,元组集的压缩表示是目前研究较多的方法.同样基于动态维持元组集有效部分的思想,为STR提出一种检测并删除无效元组和为变量更新支持的算法,作用于原始表约束并拥有单位时间的回溯代价.实验结果表明,该算法在表约束上维持GAC的效率普遍高于现有的非基于压缩表示的STR算法,并且在一些实例上的效率高于最新的基于元组集压缩表示的STR算法.  相似文献   

14.
Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases.  相似文献   

15.
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results.  相似文献   

16.
Logistic regression is an important technique for analyzing and predicting data with categorical attributes. In this paper, We consider supporting online analytical processing (OLAP) of logistic regression analysis for multi-dimensional data in a data cube where it is expensive in time and space to build logistic regression models for each cell from the raw data. We propose a novel scheme to compress the data in such a way that we can reconstruct logistic regression models to answer any OLAP query without accessing the raw data. Based on a first-order approximation to the maximum likelihood estimating equations, we develop a compression scheme that compresses each base cell into a small compressed data block with essential information to support the aggregation of logistic regression models. Aggregation formulae for deriving high-level logistic regression models from lower level component cells are given. We prove that the compression is nearly lossless in the sense that the aggregated estimator deviates from the true model by an error that is bounded and approaches to zero when the data size increases. The results show that the proposed compression and aggregation scheme can make feasible OLAP of logistic regression in a data cube.  相似文献   

17.
多特征方用于计算复杂的数据挖掘查询,在2n个粒度进行多个依赖的复杂聚集计算。现有的立方体粒度计算技术可以有效计算分布和代数多特征方,针对整体多特征方提出了优化策略:先将立方体水平分块,然后采用冰山查询技术动态选择数据以及局部分布聚集特性优化计算过程。该优化策略既减少了计算复杂度又节省了聚集计算时间,实验结果表明该计算策略比基本的解决方法性能提高一倍以上。  相似文献   

18.
印莹  赵宇海  张斌 《计算机科学》2005,32(11):88-90
数据立方计算是代价非常大的操作,并且被广泛研究。受空问的限制,存储一个完全实例化的数据立方是不可行的。最近提出的一种语义压缩数据立方一Dwarf,通过消除前缀冗余和后缀冗余把一个完全实例化的数据立方压缩存储到一个很小的空问。然而,当数据源发生变化时,它的更新过程是很复杂的。本文通过研究Dwarf在更新过程中汇总结点的变化特性,提出了一种基于Dwarf的新的增量更新算法,既能完全实例化数据立方又不需要重新计算,大大提高了数据立方的更新效率。实验进一步证明了该算法的效率和有效性,尤其适合数据仓库中的高维数据集。  相似文献   

19.
数据更新是数据仓库上支持联机分析处理的一种重要操作。增量更新是一种有效的数据更新方法。实现了二维层次式数据立方体(Cube)存储结构HDC的建立以及基于此结构的数据增量更新算法。  相似文献   

20.
提出利用Cube中的维层次聚集树(dimension hierarchy aggregate tree,简称DHA-Tree)来对聚集Cube进行增量更新维护,在维层次聚集Cube中进行数据插入和删除等数据更新时,充分利用维层次聚集树中的维层次前缀,由下向上用更新前后的差值对受到更新结点影响的所有祖先结点进行增量更新.在插入新维数据时,在不需要重新构建聚集Cube就可以对聚集Cube进行增量更新,从而减少了Cube的更新时间.对基于维层次聚集树的聚集Cube与传统Cube进行了算法性能分析和比较,结果表明本文所提出的聚集Cube的增量更新算法性能最佳.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号