首页 | 本学科首页   官方微博 | 高级检索  
     

列存储中的OLAP多查询优化方法
引用本文:陆戌辰,王梅,乐嘉锦.列存储中的OLAP多查询优化方法[J].计算机科学与探索,2012,6(9):852-864.
作者姓名:陆戌辰  王梅  乐嘉锦
作者单位:东华大学 计算机科学与技术学院,上海,201620
基金项目:国家自然科学基金Nos.61070031,61070032,61103046;国家科技重大专项“核高基”项目No.2010ZX01042-001-003-004~~
摘    要:为了使列存储OLAP(on-line analytical processing)操作中I/O和CPU开销较大的扫描、连接、聚集操作实现有效的共享和复用,提出了一个多查询优化技术。根据列存储以及OLAP操作的特点,提出了一系列转换规则,为OLAP查询请求产生的一组相关查询语句生成一个单一全局查询计划。为了达到共享复用的目的,在全局计划中引入新的过滤结点、分组结点、合并结点和聚集结点。同时,借用MuGA(multiply group by algo-rithm)算法,通过分组结点、合并结点、连接结点实现维表及事实表元组的分组序号标记,从而实现列扫描、列连接的共享。并为聚集结点提出了一个多阶段聚集算法,结合最终生成的事实表复合分组序号,实现聚集操作的复用。在SSB(star schema benchmark)数据集上设计实验,证明了该多查询优化策略的有效性。

关 键 词:列存储  联机分析处理(OLAP)  多查询优化  全局计划  操作复用

Multi-Query Optimization Strategy in Column-Based OLAP System
LU Xuchen , WANG Mei , LE Jiajin.Multi-Query Optimization Strategy in Column-Based OLAP System[J].Journal of Frontier of Computer Science and Technology,2012,6(9):852-864.
Authors:LU Xuchen  WANG Mei  LE Jiajin
Affiliation:School of Computer Science and Technology, Donghua University, Shanghai 201620, China
Abstract:This paper provides a multi-query optimization strategy to achieve the share and reuse of column scan, column join and aggregation operations in column store OLAP (on-line analytical processing) systems which can easily lead to large I/O and CPU overhead. According to the features of column-stores and OLAP applications, the paper proposes a series of transformation rules to generate a single global query plan for a set of related queries mapped from a certain OLAP require. Aiming at the share and reuse of operations, the paper also introduces four new defined nodes: the filter node, the group by node, the merge node and the aggregation node. At the same time, it makes an improvement to the MuGA (multiply group by algorithm), and uses the filter node, merge node and join node to mark group number for each tuple in dimension and fact tables, to achieve the share of column scan and column join operations. For the aggregation node, the paper proposes a multi-phase aggregation algorithm combined with compound group number of the fact table to implement effective aggregation reuse. The experimental results on the benchmark data sets SSB (star schema benchmark) also verify the effectiveness of the multi-query optimization strategy.
Keywords:column store  on-line analytical processing (OLAP)  multi-query optimization  global query plan  oper- ation reuse
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号