首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
李翠平  王珊 《计算机科学》2005,32(9):100-102
尽管利用预计算可以提高OLAP的查询效率,但是,由于存储空间的限制,预计算整个数据方体是不现实的.最近提出的综合数据方体通过将数据单元进行等价划分的方法解决了这个问题.然而,当数据源发生改变的时候,要对这样的数据方体进行维护是很困难的,即使只有一条元组发生了变化,所有的聚集值都必须重新计算,代价非常高.实际上,在有些应用环境中,人们更关注查询响应的速度,在查询结果的精度上可以放低一些要求.本文提出了如何对近似的综合数据方体进行增量维护的方法.实验证明,这些方法是非常有效的.  相似文献   

2.
随着信息技术的蓬勃发展,信息技术应用领域的数据量也越来越大,数据仓库的运用也越来越广泛和普遍,特别是在大数据时代,随着数据量的增加,数据仓库管理的数据也越来越多,数据方体的数据量也越来越大,因此也给数据方体的存储和查询带来了巨大的挑战,怎样能够支持对大型数据方体的快速查询,又能减少存储空间,在联机分析处理系统将是非常关键的一环,通过基于哈希算法的增强编码位图索引技术能够有效地减少存储空间并且提高查询效率。  相似文献   

3.
基于数据立方体的静态推理控制方法在联机分析处理(OLAP)系统中的访问有效性不高。为此,提出一种基于数据立方体的动态推理控制方法。该方法以提高OLAP系统访问有效性为目的,实时处理在线查询,分析查询方体的推理威胁,阻止其推理产生,并动态返回可访问方体集。实验结果表明,该方法可提高推理系统的有效性,且与静态推理控制方法有相同的安全性。  相似文献   

4.
针对Multi-Radio Multi-Channel传感器网络中链路服务质量和信道冲突等问题,提出并证明了基于缓存和信道切换的数据查询问题是一个NP完全问题.根据数据流守恒和链路-信道等约束条件,建立线性规划方程,得到该问题的最优解模型,并提出了一个多项式时间的近似算法——贪心新覆盖数据算法.该算法采用动态规划策略最小化缓存节点将单位数据包传输到查询节点所需要的路径时延,再贪心选择其具有最小路径时延的缓存节点,收集其新覆盖数据.理论分析和实验结果表明,提出的方案能有效地减少数据收集时延,提高数据查询效率.  相似文献   

5.
系统数据库数据量大时,常常会遇到系统响应时间过长、占用系统资源过多等一系列问题。因此对海量数据查询技术的研究就很必要。文中基于Oracle数据库,采用各种措施和方法来提高响应速度:合理设计数据库结构、使用索引、分区、SQL语句优化等手段优化数据库。另外,分页显示查询结果也能很大地提高查询速度。实验证明,优化后速度明显提高了很多。因此可见,文中采用的优化方法对海量数据的查询优化是正确有效的。  相似文献   

6.
数据集成中XML数据查询语义重写   总被引:10,自引:0,他引:10  
查询重写是数据库研究的一个基本问题,它和查询优化,数据仓库,数据集成,语义缓存等数据库问题密切相关,为提高集成系统的查询效率,系统选择提交频率较高的XML查询物化为中间层视图,用户提交查询后,系统尽可能利用中间视图层中视图,而不是访问数据源来回答查询,这个问题实际可以归结为半结构化查询重写问题,考虑到中间视图层空间的有限性,已有视图应当尽可能回答更多的查询,传统查询重写方法有考虑半结构化数据之间的约束,而根据约束可以等价变换查询,从而提高中间视图层中的表达能力,提出了一种新的半结构化查询重写的方法,该方法在保证算法正确性和完备性的基础上,利用上半结构化数据中的约束,尤其是XML文件中的路径依赖,来增强中间层物化视图的表达能力,理论分析和初步原型实验证明方法的有效性。  相似文献   

7.
在数据仓库中,如何选择实例化视图是一个重要的问题。针对一类特定的数据立方体,该文提出了一个基于代价策略的实例化视图选择算法。通过对一个实际数据集的分析,发现在数据立方体中有很多父子视图具有相同的体积,其原因是用于产生数据立方体的基本关系的属性之间存在着密切的联系。显然,对这类数据立方体不能像算法PBS那样按照体积的大小来选择要实例化的视图。为此,设计了算法PBC,不但可以快速地给出满足条件的实例化视图集,而且可以准确地找到具有最短平均响应时间的实例化视图集,避免了在用户给出过大的参数时,实例化一些无益于缩短查询响应时间的视图。实验结果表明,算法PBC是有效的。  相似文献   

8.
基于历史数据,what-if分析方法可对假设情景进行分析,为决策者提供预测信息,但在分析过程中存在数据方体的重新计算问题。针对该问题,提出一种使用Max函数的数据方体增量计算方法,减少增量计算过程中对事实表的访问时间。实验结果表明,该方法可将增量计算的性能提高约15%。  相似文献   

9.
周波 《计算机学报》1999,22(6):620-626
为实现MOLAP和ROLAP的有机融合,达到较好的存储效率和操作效率,提出了一种基于密集区域的新的数据方体组织结构,给出了确定数据方体中密集区域的明确定义,分析了现有的相关算法的可行性,在此基础上,提出了一种在数据方体中寻找密集区的算法ScanChunk同时分析了算法的计算精度和复杂度,并进行了详细的实验,结果表明,ScanChunk在方体维数不超过ScanChunk在方体维数不超过6的情况下是一  相似文献   

10.
针对MANET环境中带宽有限、能量有限、存储有限和链路频繁的断接性等特点,提出了基于缓存的移动数据查询问题,证明该问题是NP完全问题,并给出一个多项式时间的近似算法,即最大节点新覆盖数据算法MD.该算法采用贪心策略,查询新覆盖数据量最大的节点,减少了查询次数,并最大限度地减少了网络中的传输时延.然后在MD算法的基础上,同时考虑了节点新覆盖数据量和链路服务质量问题,提出了一种改进的高效的启发式算法,即基于最大节点DD值的算法MDD,有效地减少了能量消耗,最小化数据传输时延,提高了网络的吞吐量.理论分析及实验结果表明提出的数据查询算法能够充分利用缓存节点的数据信息,较好地完成数据查询工作,有效地减少数据收集时延,提高查询效率.  相似文献   

11.
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.  相似文献   

12.
Data cubes capture general trends aggregated from multidimensional data from a categorical relation. When provided with two relations, interesting knowledge can be exhibited by comparing the two underlying data cubes. Trend reversals or particular phenomena irrelevant in one data cube may indeed clearly appear in the other data cube. In order to capture such trend reversals, we have proposed the concept of Emerging Cube. In this article, we emphasize on two new approaches for computing Emerging Cubes. Both are devised to be integrated within standard Olap systems, since they do not require any additional nor complex data structures. Our first approach is based on Sql. We propose three queries with different aims. The most efficient query uses a particular data structure merging the two input relations to achieve a single data cube computation. This query works fine even when voluminous data are processed. Our second approach is algorithmic and aims to improve efficiency and scalability while preserving integration capability. The E-Idea algorithm works a´ laBuc and takes the specific features of Emerging Cubes into account. E-Idea is automaton-based and adapts its behavior to the current execution context. Our proposals are validated by various experiments where we measure query response time. Comparative experiments show that E-Idea’s response time is proportional to the size of the Emerging Cube. Experiments also demonstrate that extracting Emerging Cubes can be computed in practice, in a time compatible with user expectations.  相似文献   

13.
We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.  相似文献   

14.
A Genetic Selection Algorithm for OLAP Data Cubes   总被引:1,自引:0,他引:1  
Multidimensional data analysis, as supported by OLAP (online analytical processing) systems, requires the computation of many aggregate functions over a large volume of historically collected data. To decrease the query time and to provide various viewpoints for the analysts, these data are usually organized as a multidimensional data model, called data cubes. Each cell in a data cube corresponds to a unique set of values for the different dimensions and contains the metric of interest. The data cube selection problem is, given the set of user queries and a storage space constraint, to select a set of materialized cubes from the data cubes to minimize the query cost and/or the maintenance cost. This problem is known to be an NP-hard problem. In this study, we examined the application of genetic algorithms to the cube selection problem. We proposed a greedy-repaired genetic algorithm, called the genetic greedy method. According to our experiments, the solution obtained by our genetic greedy method is superior to that found using the traditional greedy method. That is, within the same storage constraint, the solution can greatly reduce the amount of query cost as well as the cube maintenance cost.  相似文献   

15.
封闭数据立方是一种有效的无损压缩技术,它去掉了数据立方中的冗余信息,从而有效降低了数据立方的存储空间、加快了计算速度,而且几乎不影响查询性能.Hadoop的MapReduce并行计算模型为数据立方的计算提供了技术支持,Hadoop的分布式文件系统HDFS为数据立方的存储提供了保障.为了节省存储空间、加快查询速度,在传统数据立方的基础上提出封闭直方图立方,它在封闭数据立方的基础上通过编码技术进一步节省了存储空间,通过建立索引加快了查询速度.Hadoop并行计算平台不论从扩展性还是均衡性都为封闭直方图立方提供了保证.实验证明:封闭直方图立方对数据立方进行了有效压缩,具有较高的查询性能,根据Hadoop的特点通过增加节点个数明显加快了计算速度.  相似文献   

16.
周龙  郑诚 《微机发展》2006,16(6):101-103
通过对数据仓库和OLAP概念及体系结构的分析,描述了一种OLAP应用系统的设计方案,并介绍了它的具体实现方法。基于数据仓库的查询,一般都是及时特定查询,要在严格的响应时间内执行复杂的查询,遍历百万上亿的记录,同时进行可能很复杂的搜索、连接和汇总的操作。查询的数据吞吐量和响应时间是判断数据仓库性能的重点。CUBE的计算是OLAP及时查询的基础,提高查询的速度需要对OLAP进行预先的计算。文中系统比较了一些计算立方体的算法,并运用到具体的系统当中。  相似文献   

17.
李红松  黄厚宽 《软件学报》2006,17(4):806-813
以往在数据立方体上实现的联机聚集往往需要附加空间来存储联机聚集估算所需要的信息,极大地影响了数据立方体的存储和维护性能.提出了基于QC-Tree的用于范围查询处理的联机聚集PE(progressively estimate)算法以及它与简单聚集算法相结合的混合聚集算法HPE(hybrid progressively estimate);还提出了一种能够同时处理多个范围查询的联机聚集算法MPE(multiple progressively estimate).与以往联机聚集算法不同,这些算法不需要任何附加空间,而是利用QC-Tree自身保存的聚集数据和语义关系来估算聚集结果.由于QC-Tree是一种极为高效的数据立方体存储结构,因此能够以较理想的性能实现数据立方体上的联机聚集.对算法的分析和实验结果表明,所提出的算法具有较好的性能.  相似文献   

18.
In this paper we present the Brown Dwarf, a distributed data analytics system designed to efficiently store, query and update multidimensional data over commodity network nodes, without the use of any proprietary tool. Brown Dwarf distributes a centralized indexing structure among peers on-the-fly, reducing cube creation and querying times by enforcing parallelization. Analytical queries are naturally performed on-line through cooperating nodes that form an unstructured Peer-to-Peer overlay. Updates are also performed on-line, eliminating the usually costly over-night process. Moreover, the system employs an adaptive replication scheme that adjusts to the workload skew as well as the network churn by expanding or shrinking the units of the distributed data structure. Our system has been thoroughly evaluated on an actual testbed: it manages to accelerate cube creation up and querying up to several tens of times compared to the centralized solution by exploiting the capabilities of the available network nodes working in parallel. It also manages to quickly adapt even after sudden bursts in load and remains unaffected with a considerable fraction of frequent node failures. These advantages are even more apparent for dense and skewed data cubes and workloads.  相似文献   

19.
The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2n cuboids, the conventional methods compute 2ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2n delta cuboids. We formulate an optimization problem to find the optimal subset of 2n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.  相似文献   

20.
在侏儒立方体研究的基础上,提出了一种新的能够保持语义的立方体结构。这种结构改变了侏儒立方体对聚集数据的存储方式,在保持基本立方体上卷、下钻语义的前提下,尽量地去除前缀冗余、后缀冗余,节约存储空间,保证立方体清晰的结构,并且拥有比侏儒立方体更高的存储效率和查询响应速度,对点查询和范围查询能够快速地返回结果,对大数据量情况下的稀疏立方体具有良好的支持。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号