首页 | 本学科首页   官方微博 | 高级检索  
 共查询到18条相似文献,搜索用时 187 毫秒
前缀立方的索引   总被引:1,自引:0,他引:1  
前缀立方是最近提出的一种新的数据立方结构.它利用前缀共享和基本单元组有效地缩小了数据立方的尺寸,相应减少了数据立方的计算时间.为提高前缀立方的查询性能,本文提出了它的一种索引机制Prefix-CuboidTree.文中用真实数据集和模拟数据集进行大量实验,证明了该索引机制的查询性能.  相似文献   

根据data cube层次性的特点和查询习惯提出了新的分块计算方法,并在此基础上提出了改进算法.这种方法节约了存储空间,在LBD粒度及其上的查询效率为O(1),同时数据的更新时间大约为O(),还节约了大量的存储空间,并且使得数据立方具有了一定的结构独立性,能有效的减少重新构造数据立方(reprocess)的次数,因而在时间上和效率上有较大的优势.  相似文献   

查询速度是联机分析处理中的一个关键性能指标,人们通过事先生成所有可能的聚集来提高查询速度,然而这样的完全物化是以存储空间为代价的.针对数据立方体数据分布特点和结合压缩技术,本文介绍如何最大化节省存储空间来进行完全物化,然后在此基础上对查询进行了研究,以达到最小存储空间以及较好的查询速度的目的.  相似文献   

OLAP通常使用预计算数据立方的方法提高可能的聚集查询的响应速度,在内存实化预计算的数据,可以更进一步加快响应的速度,但是受到内存空间的限制。在浓缩数据立方的环境中,动态地选择一定的数据小方在内存实化,加快响应速度,并更好地适应不同的查询模式。给出了在动态选择模型中,特定存储方式下的查询分解和响应算法。  相似文献   

预计算一个完整的数据立方可以获得最快的查询响应速度,但是对于一个大规模的数据立方,所需的存储空间非常大,因此通常只能预先计算数据立方中的部分聚集。文章提出了计算部分数据立方的算法PCC(PartialComputationofCube),它的特点是采用自底向上的划分方法,能根据需要计算的聚集确定维的划分路径,并裁减不必要的聚集和划分。实验表明,和利用完整数据立方的计算方法BUC来计算部分数据立方的方法比,PCC算法的效率更高。  相似文献   

CBFrag-Cubing:一种基于压缩位图的高维数据立方创建算法   总被引:1,自引:0,他引:1  
数据立方的计算是数据仓库和OLAP研究的一个重要方向,同时又是数据仓库中代价很大的操作。针对在生物信息、统计分析、文本处理等领域中存在的基数较小的高维数据集,X.L.Li等人提出了Frag-Cubing算法。为了提高Frag-Cubing算法的效率,本文提出了基于分片思想的算法CBFrag-Cubing。该算法使用了位图索引结构,优化了数据立方的存储,减少了数据立方的计算时间。实验表明,与Frag-Cubing算法相比,该算法在存储空间上至少节省25%,在计算时间上节省30%。  相似文献   

封闭数据立方体技术研究   总被引:14,自引:1,他引:14  
李盛恩  王珊 《软件学报》2004,15(8):1165-1171
数据立方体中有很多冗余信息,去除这些冗余信息不但可以节约存储空间,还可以加快计算速度.数据立方体中的元组可以划分为封闭元组和非封闭元组.对任何一个非封闭元组,一定存在一个封闭元组,它们都是从基本表的同一组元组中经过聚集运算得到的,因而具有相同的聚集函数值.去掉数据立方体中所有的非封闭元组就产生了一个封闭数据立方体.提出了封闭数据立方体的生成算法、查询算法和增量维护算法,并使用合成数据和实际数据做了一些实验.实验结果表明,封闭数据立方体技术是有效的.  相似文献   

针对传统电子政务平台所采用的关系型数据库在处理海量数据时存在性能瓶颈问题,利用Hadoop分布式平台在处理海量数据方面的优势,结合HDFS分布式文件系统、Map/Reduce并行计算模型和Hive仓库技术,设计关系型数据库与Hadoop相结合的电子政务云平台,两者协同提供海量数据查询操作和存储服务,从而降低了关系型数据库服务器的负载压力,增强电子政务平台的扩展性。通过实验证明,Hadoop能大大提高电子政务云平台的查询效率。进一步分析该设计方案中影响查询效率的因素,为深入研究基于Hadoop构建高效的电子政务云提供参考。  相似文献   

大数据计算是物联网和云计算的研究热点之一.针对大数据中的结构化和非结构化数据,Hadoop技术在实时性要求不高的场景中应用效果较好,但在实时性要求高的场景中不能满足需求.针对这一问题,论文利用对象化并行计算提出了一种高效的实时性解决方案.对象化并行计算融合了对象化、Hadoop、内存计算等技术.在方案中,业务数据格式化成对象并分布式存储到集群计算机内存中,任务拆分成子任务通过并行计算来完成.对象化并行计算系统应用在国家电网公司电网资产质量监督管理系统中,应用效果表明该方案可大幅度提升系统性能,满足实时性需求.  相似文献   

Quotient Cube和QC-tree试图在浓缩一个数据立方尺寸的同时,保持该数据立方蕴涵的语义,但是,前者没有语义关系的存储,后者存储的语义关系是晦涩模糊的.为此提出了下钻立方结构,首次从语义角度考虑数据立方存储,存储的不是类的内容,而是类之间的直接下钻关系.下钻立方不仅能够极大地减小数据立方的存储尺寸,而且可以清晰地表达原数据立方蕴涵的下钻语义.此外,下钻立方具有较高的查询响应性能,这一点在范围查询中表现得尤其显著.实验和分析表明,下钻立方在存储尺寸和查询响应方面明显优于QC-tree,适于用来组织和存储数据立方.  相似文献   

MapReduce环境下的并行Dwarf立方构建   总被引:1,自引:0,他引:1  
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合...  相似文献   

文章利用并行计算框架MapReduce,探索数据立方体的计算问题。数据立方体的计算存在两个关键问题,一个是计算时间的问题,另一个是立方体的体积问题。随着维度的增加,计算时间将呈现指数级的增长,立方体的体积也是如此。尽管MapReduce是一个优秀的并行计算框架,但在处理数据倾斜时,分区算法不够完善,导致一些计算任务时间过长,影响整个作业的完成时间。本文通过数据采样的方式,优化数据分区,实验结果表明,数据立方体的计算的性能明显提升。为解决数据立方体体积过大的问题,在Reduce阶段将最终的结果输出到基于NoSQL的HBase数据库进行存储,HBase方便水平扩展,同时也便于日后对数据立方体的查询。  相似文献   

PMC: Select Materialized Cells in Data Cubes   总被引:1,自引:0,他引:1       下载免费PDF全文
QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC-Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube.  相似文献   

在侏儒立方体研究的基础上,提出了一种新的能够保持语义的立方体结构。这种结构改变了侏儒立方体对聚集数据的存储方式,在保持基本立方体上卷、下钻语义的前提下,尽量地去除前缀冗余、后缀冗余,节约存储空间,保证立方体清晰的结构,并且拥有比侏儒立方体更高的存储效率和查询响应速度,对点查询和范围查询能够快速地返回结果,对大数据量情况下的稀疏立方体具有良好的支持。  相似文献   

The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.  相似文献   

The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses data cube dramatically. However, its query cost is so high that it cannot be used in most applications. This paper introduces the semi-closed cube to reduce the size of data cube and achieve almost the same query response time as the data cube does. Semi-closed cube is a generalization of condensed cube and quotient cube and is constructed from a quotient cube. When the query cost of quotient cube is higher than a given threshold, semi-closed cube selects some views and picks a fellow for each of them. All the tuples of those views are materialized except those closed by their fellows. To find a tuple of those views, users only need to scan the view and its fellow. Thus, their query performance is improved. Experiments were conducted using a real-world data set. The results show that semi-closed cube is an effective approach of data cube.  相似文献   

GSFC--基于图结构的Free Cube存储方法   总被引:2,自引:0,他引:2  
free cube利用发掘基本关系表维值之间的蕴含规则,去除data cube中内在冗余,有效减小data cube体积.但是还存在一些值得进一步研究的问题.首先,直接地表示free cube仍然不够精简从而浪费了存储空间.其次,只提到了查询的基本思想,没有给出具体的查询技术.针对这些问题,提出了基于图结构的存储方法GSFC,利用前缀压缩进一步减小free cube体积.同时,该方法结合了存储和索引结构,有效解决free cube的查询问题.最后给出了计算和查询算法,并利用实验来证明算法的有效性.  相似文献   

We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号