首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
I/O parallelism is considered to be a promising approach to achieving high performance in parallel data warehousing systems where huge amounts of data and complex analytical queries have to be processed. This paper proposes a parallel secondary data cube storage structure (PHC for short) to efficiently support the processing of range sum queries and dynamic updates on data cube using parallel computing systems. Based on PHC, two parallel algorithms for processing range sum queries and updates are proposed also. Both the algorithms have the same time complexity, O(logdn/P). The analytical and experimental results show that PHC and the parallel algorithms have high performance and achieve optimum speedup.  相似文献   

2.
文章利用并行计算框架MapReduce,探索数据立方体的计算问题。数据立方体的计算存在两个关键问题,一个是计算时间的问题,另一个是立方体的体积问题。随着维度的增加,计算时间将呈现指数级的增长,立方体的体积也是如此。尽管MapReduce是一个优秀的并行计算框架,但在处理数据倾斜时,分区算法不够完善,导致一些计算任务时间过长,影响整个作业的完成时间。本文通过数据采样的方式,优化数据分区,实验结果表明,数据立方体的计算的性能明显提升。为解决数据立方体体积过大的问题,在Reduce阶段将最终的结果输出到基于NoSQL的HBase数据库进行存储,HBase方便水平扩展,同时也便于日后对数据立方体的查询。  相似文献   

3.
封闭数据立方体技术研究   总被引:14,自引:1,他引:14  
李盛恩  王珊 《软件学报》2004,15(8):1165-1171
数据立方体中有很多冗余信息,去除这些冗余信息不但可以节约存储空间,还可以加快计算速度.数据立方体中的元组可以划分为封闭元组和非封闭元组.对任何一个非封闭元组,一定存在一个封闭元组,它们都是从基本表的同一组元组中经过聚集运算得到的,因而具有相同的聚集函数值.去掉数据立方体中所有的非封闭元组就产生了一个封闭数据立方体.提出了封闭数据立方体的生成算法、查询算法和增量维护算法,并使用合成数据和实际数据做了一些实验.实验结果表明,封闭数据立方体技术是有效的.  相似文献   

4.
New Algorithm for Computing Cube on Very Large Compressed Data Sets   总被引:2,自引:0,他引:2  
Data compression is an effective technique to improve the performance of data warehouses. Since cube operation represents the core of online analytical processing in data warehouses, it is a major challenge to develop efficient algorithms for computing cube on compressed data warehouses. To our knowledge, very few cube computation techniques have been proposed for compressed data warehouses to date in the literature. This paper presents a novel algorithm to compute cubes on compressed data warehouses. The algorithm operates directly on compressed data sets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube is also proposed  相似文献   

5.
As companies seek to automate more of their processes, they are finding that decision support requires a significantly different data management approach than day-to-day operations. Online transaction processing applications simply automate data processing, which is sufficient to handle day-to-day operations. The paper considers a seven-step process which combines online analytical processing, data cube analysis, and data mining to streamline decision making in companies with multidimensional databases  相似文献   

6.
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.  相似文献   

7.
空间Cube计算方法   总被引:3,自引:0,他引:3  
随着卫星勘测、遥感影像、GPS等系统的广泛应用,目前各行各业拥有了大量的地理空间数据。空间数据仓库技术将较为成熟的数据仓库和联机分析处理技术应用到空间信息领域,以有效地支持空间分析和决策。空间Cube的构建与维护是空间数据仓库和空间联机分析处理的一个核心问题。文章在介绍空间数据仓库模型和空间Cube的基础上,结合空间聚集计算的特点,给出了几种空间Cube计算的有效方法。  相似文献   

8.
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors   总被引:8,自引:2,他引:6  
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. This paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology.We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2,000,000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of 227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of 846 million rows (21.7 Gigabytes) in under 47 minutes.  相似文献   

9.
A systematic approach to test data design is presented based on both practical translation of theory and organization of professional lore. The approach is organized around five domains and achieving coverage (exercise) of them by the test data. The domains are processing functions, input, output, interaction among functions, and the code itself. Checklists are used to generate data for processing functions. Separate checklists have been constructed for eight common business data processing functions such as editing, updating, sorting, and reporting. Checklists or specific concrete directions also exist for input, output, interaction, and code coverage. Two global heuristics concerning all test data are also used. A limited discussion on documenting test input data, expected results, and actual results is included.  相似文献   

10.
一种保持语义的压缩数据立方体结构   总被引:2,自引:1,他引:1       下载免费PDF全文
通常数据立方体体积较大,语义关系复杂,完整的语义立方体很难实现。基于商立方体,该文提出了语义数据立方体结构(SDC),将单元格中的单元以其上界替代,并保存下界,简化了单元格的表示,保持单元格的全部语义,并可以实现单元的上卷和下钻操作。把语义关系应用到数据立方体的查询、增量更新中,使查询响应时间及更新代价大大降低。实验结果表明,SDC是有效的。  相似文献   

11.
数据仓库系统中一种改进的维层次聚集Cube存储结构   总被引:3,自引:0,他引:3  
提出利用Cube中的维层次(dimension hierarchy)聚集技术来创建高性能的维层次聚集Cube(dimension hierarchy aggregate cube,DHAC).充分利用DHAC已保存的维层次信息,对Cube中多维数据的查询和更新效率进行了优化,并且支持Cube的上探、下钻等语义操作.在DHAC中进行数据插入和删除等数据更新时,由下向上用更新前后的差值对受到更新结点影响的所有祖先结点进行增量更新.实现了在插入新维或维层次时不需要重新构建聚集Cube就可以实现Cube的模式更新.对维层次聚集Cube与传统Cube进行了算法性能分析和比较,理论分析和实验结果都表明,所提出的DHAC性能最佳.  相似文献   

12.
Quotient Cube和QC-tree试图在浓缩一个数据立方尺寸的同时,保持该数据立方蕴涵的语义,但是,前者没有语义关系的存储,后者存储的语义关系是晦涩模糊的.为此提出了下钻立方结构,首次从语义角度考虑数据立方存储,存储的不是类的内容,而是类之间的直接下钻关系.下钻立方不仅能够极大地减小数据立方的存储尺寸,而且可以清晰地表达原数据立方蕴涵的下钻语义.此外,下钻立方具有较高的查询响应性能,这一点在范围查询中表现得尤其显著.实验和分析表明,下钻立方在存储尺寸和查询响应方面明显优于QC-tree,适于用来组织和存储数据立方.  相似文献   

13.
数据仓库中的一种立方体数据模型   总被引:9,自引:1,他引:9  
数据仓库和联机分析处理(OLAP)是商业数据处理领域中的两个最重大的新技术。OLAP应用要求对数据仓库中存储的大量数据进行分析,用标准关系数据库技术来实现非常复杂的查询是相当困难的。所以,在数据仓库中,数据被组织成立方体数据模型。该文提出了一种简单、直观的数据立方体模型以及在这个立方体上支持OLAP操作的代数。为复杂的查询提供了简要的表述方法。  相似文献   

14.
A Genetic Selection Algorithm for OLAP Data Cubes   总被引:1,自引:0,他引:1  
Multidimensional data analysis, as supported by OLAP (online analytical processing) systems, requires the computation of many aggregate functions over a large volume of historically collected data. To decrease the query time and to provide various viewpoints for the analysts, these data are usually organized as a multidimensional data model, called data cubes. Each cell in a data cube corresponds to a unique set of values for the different dimensions and contains the metric of interest. The data cube selection problem is, given the set of user queries and a storage space constraint, to select a set of materialized cubes from the data cubes to minimize the query cost and/or the maintenance cost. This problem is known to be an NP-hard problem. In this study, we examined the application of genetic algorithms to the cube selection problem. We proposed a greedy-repaired genetic algorithm, called the genetic greedy method. According to our experiments, the solution obtained by our genetic greedy method is superior to that found using the traditional greedy method. That is, within the same storage constraint, the solution can greatly reduce the amount of query cost as well as the cube maintenance cost.  相似文献   

15.
基于数据立方体的属性核计算方法   总被引:1,自引:1,他引:0       下载免费PDF全文
商业智能系统应用联机分析处理技术将数据组织为多维数据立方体。该文建立了数据立方体中非空单元与决策表中等价类的一一映射关系。通过复用数据立方体中的聚合结果,提出一种基于数据立方体计算相容决策表属性核的方法,并证明了该方法的正确性。利用UCI数据集进行实验,结果表明在大数据量下该方法具有较好的时间效率。  相似文献   

16.
冯玉才  刘玉葆  冯剑琳 《软件学报》2003,14(10):1706-1716
约束立方梯度挖掘是一项重要的挖掘任务,其主要目的是从数据立方中挖掘出满足梯度约束的梯度-探测元组对.然而,现有的研究都是基于一般数据立方的.研究了浓缩数据立方中约束数据立方梯度的挖掘问题.通过扩展LiveSet驱动算法,提出了一个eLiveSet算法.测试表明,该算法在立方梯度挖掘效率上比现有算法要高.  相似文献   

17.
研究了基于空间数据仓库的一种决策分析工具——空间在线分析处理(OLAP)的支撑技术。将普通数据立方体与空间数据立方体进行比较,提出空间数据立方体的维和度量的建模方法,解决了空间维与非空间维、空间度量与数值度量的集成建模问题。  相似文献   

18.
The shared nothing parallel database architecture is gaining wide popularity due to its scalability and increased data availability. However, in order to efficiently utilize parallelism in such architecture, independent data sets must be assigned to different processing nodes. This, of course, can initially be achieved by employing a careful partitioning scheme that allocates disjoint data sets to different processors. However, variations in the data access pattern may render some processors overloaded while others underloaded. This skewness in data access decreases the effective parallelism and eventually leads to overall performance degradation. A number of solutions have been proposed to periodically perform data re-allocation to remove the skewness in data access. Most of the proposed solutions perform either static re-allocation that requires the system to be taken off-line or dynamic, but non-transactional, re-allocation. In this paper, we introduce a dynamic and transactional re-allocation scheme based on the work on disk cooling in shared memory architecture by Scheuermann et al. The proposed scheme enhances the effective parallelism in the system regardless of the variations in the pattern of access. The proposed scheme detects access skew as it occurs and re-allocates data partitions to underloaded processing elements on the fly. Only the block being moved becomes unavailable. In addition, mutual consistency among transactions concurrent to the re-allocation event is preserved. The proposed scheme also uses replication as an additional cooling mechanism to help distribute access load over multiple replicas. We conducted a series of simulation experiments to study the behavior of shared nothing parallel database systems with and without the proposed dynamic re-allocation scheme. We also experimented with several replication strategies to measure their impact on the system performance. Finally, we studied the effect of using different concurrency control strategies on the efficiency of dynamic re-allocation.  相似文献   

19.
高光谱遥感数据挖掘若干基本问题的研究   总被引:1,自引:0,他引:1  
面向高光谱遥感信息的特点,分析了高光谱遥感数据挖掘的形成和作用,在构建其框架体系与处理流程的基础上。探讨了可以发现的知识类型和典型的挖掘模式,并分析了一些主要挖掘算法和关键技术。最后对高光谱遥感数据挖掘潜在的应用方向进行了探讨。  相似文献   

20.
Graphics processing units (GPUs) have an SIMD architecture and have been widely used recently as powerful general-purpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing and the stream cube. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号