首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
数据仓库通常要对大量的数据进行运算,以精简的结果来回答用户的查询,这一特点使得物化视图技术在数据仓库中尤为重要.然而现有支持物化视图自动选择的方法是静态的,它违背了联机分析处理(OLAP)和决策支持系统(DSS)的动态本质.本文提出了可扩展的动态物化视图方法,通过将整个物化视图选择问题(MVS)分解为三个阶段,降低了问题的复杂度,提高了物化视图的有效性.通过动态调整,物化视图能即时适应查询需求.算法复杂度分析证明了方案的可扩展性.动态调整算法模拟实验验证了方案具有很好的自适应性.  相似文献   

2.
数据仓库中物化视图选择策略   总被引:2,自引:0,他引:2  
为了提高决策支持和OLAP查询的响应效率,数据仓库多采用物化视图的思想.因此,物化视图的选择策略是数据仓库研究的重要问题之一.其目标是选出一组存储、维护代价与查询代价的总和为最小的物化视图.提出一个以MVPP(multi-view processing plan)为视图选择的搜索空间的物化视图选择新算法--VSMF(views selection base on multi-factor)算法.该算法在存储空间约束下同时实现多查询最优化和视图维护最优化.  相似文献   

3.
李明  刘青宝  陆昌辉 《计算机应用》2009,29(6):1605-1611
针对现有物化视图选择算法无法很好解决OLAP随机查询的问题,提出了一种新型的两阶段物化视图选择算法(2-PMVS),将传统的静态选择算法与动态选择算法相结合,使其可以动态矫正用户随机查询与预估查询之间的偏差。经实验证明,该算法切实有效。  相似文献   

4.
Selection of views to materialize in a data warehouse   总被引:4,自引:0,他引:4  
A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In This work, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.  相似文献   

5.
Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a “uniform” query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally.  相似文献   

6.
近年来数据仓库成为数据库研究领域中最活跃的一个分支,而该领域的一个核心就是OLAP查询优化问题.多维表达式(MDX)为多条相关的OLAP查询语句同时查询提供了接口.如何利用数据仓库中大量的冗余实化视图去加速OLAP的查询,国外学者对该问题进行了大量分析并提出了一些优化算法.本文对上述算法进行了研究,发现其对实化视图的利用并不充分,于是提出了改进算法并进行了验证.实验表明本算法对查询性能有明显提高.  相似文献   

7.
数据仓库中用存储大量的物化视图来加速OLAP的查询响应,物化视图的选取是数据仓库设计中的一个重要问题。论文提出了一个有效的物化视图选取算法,采用基于数据立方体层次搜索的方式选取视图。经分析与测试表明,该算法取得良好的效果和效率。  相似文献   

8.
View materialization is an effective method to increase query efficiency in a data warehouse and improve OLAP query performance. However, one encounters the problem of space insufficiency if all possible views are materialized in advance. Reducing query time by means of selecting a proper set of materialized views with a lower cost is crucial for efficient data warehousing. In addition, the costs of data warehouse creation, query, and maintenance have to be taken into account while views are materialized. In this paper, we propose efficient algorithms to select a proper set of materialized views, constrained by storage and cost considerations, to help speed up the entire data warehousing process. We derive a cost model for data warehouse query and maintenance as well as efficient view selection algorithms that effectively exploit the gain and loss metrics. The main contribution of our paper is to speed up the selection process of materialized views. Concurrently, this will greatly reduce the overall cost of data warehouse query and maintenance.  相似文献   

9.
Data warehouses are built to reply query searches efficiently from integrated data of various systems. To improve the performance of the system, the issue of materializing views within data warehouses must be explored. This involves to pre-compute a set of selected views which are fact and dimension tables, under given resource and quality constraints. The quality constraints include query processing time, data maintenance time and the freshness of data when queries are placed. Then there is the policy of updating, which treats the time issue of data reloading in data warehouses. A model is proposed to determine the view selection and update policy when the arrival of queries follows Poisson processes with the constraints of system response time, storage space and query dependent currency of data (on systems capable of periodic and query-triggered updates). To the best of the researchers’ knowledge, no other research has considered all these factors in their models. A two-phase greedy algorithm was developed to determine the optimal update policy for the view selection problem. Numerous experiments were performed to explore the sensitivity of the proposed model under various constraints and system parameter settings. The results show that the model has reasonable responses to the tunings and that the proposed algorithm can rapidly find acceptable solutions.  相似文献   

10.
雷旭  袁捷 《计算机工程》2006,32(6):79-81
当采用实视图来提高OLAP系统效率时,由于实视图往往并不恰巧是一个完整的格节点,即实视图是多维数据切片(MRFs),因此系统中会出现大量有重叠数据的实视图,这不仅占用了过多的存储空间。也使得系统根据已有实视图响应用户提交的多维查询变得复杂。以往的实视图动态选择算法没有考虑这种情况的处理。文章结合格模型的概念,提出了合并数据重叠实视图的算法,包括如何判定实视图之间有重叠数据、如何合并有数据重叠的实视图等。  相似文献   

11.
For speeding up query processing on Big Data, frequent sub-queries or views may be materialized such that the query processing cost is minimized with optimum cost of maintaining the materialized views and/or queries. Materializing frequent sub-queries and views means that resultant data set of the views reside in the memory of one or more nodes in the cluster, so that it reduces the MapReduce cost, submission and scheduling cost of Distributed File System jobs for query processing. We have defined materialized views as resultant data of frequent sub-queries and aggregation functions of a set of Big Data warehousing queries that are saved for enhancing query performance. The problem is defined as a multi-objective optimization problem for minimizing the total query processing MapReduce cost, MapReduce cost for maintaining the materialized views and the number of views selected for materializing with maximized total size of the views selected. We applied Differential Evolution algorithm and NSGA-II to study their performances for developing a recommendation system for selecting views for materializing in Big Data warehousing.  相似文献   

12.
OLAP is a category of database technology that allows analysts to gain insight into the aggregation of data by enabling them to gain access to a variety of different views of the information contained in a database. It is very important to provide analysts with guaranteed error bounds for approximate results to aggregation queries in enterprise applications such as decision support systems. We propose a general method of providing tight error bounds for approximate results to OLAP range-sum queries. We perform an extensive experiment on diverse data sets and examine the effectiveness of the proposed method for various data cube dimensions and query sizes.  相似文献   

13.
《Information Systems》2001,26(5):363-381
A data warehouse (DW) can be abstractly seen as a set of materialized views defined over a set of remote data sources. A DW is intended to satisfy a set of queries. The views materialized in a DW relate to each other in a complex manner, through common subexpressions, in order to guarantee high query performance and low view maintenance cost. DWs are time varying. As time passes new materialized views are added in order to satisfy new queries, or for performance reasons, while old queries are dropped. The evolution of a DW can result in a redundant set of materialized views. In this paper, we address the problem of detecting redundant materialized views in a given DW view selection, that is, materialized views that can be removed from DW without negatively affecting the query evaluation or the view maintenance process. Using an AND/OR dag representation for multiple queries and views, we first formalize the process of propagating source relation changes to the materialized views by exploiting common subexpressions between views and by using other materialized views that are not affected by these changes. Then, we provide an algorithm for detecting materialized views that are not needed in the process of propagating source relation changes to the DW. We also show how trivially redundant views can be identified in this process. Finally, we use these results to provide a procedure for detecting materialized views that are redundant in a DW. Our approach considers a broad class of views that includes grouping/aggregation views and is not dependent on a specific cost model.  相似文献   

14.
基于多维护策略的物化视图选择方法   总被引:1,自引:0,他引:1  
物化视图是数据仓库环境中提高OLAP查询效率的重要手段,因此,物化视图的选择是数据仓库设计中重要的决策之一。本文提出的物化视图选择方法目标是选择合适的视图进行物化,使得查询处理的总代价和物化视图的维护代价最低,提出了物化视图收益模型,并在此基础上基于视图的多维护策略提出了物化视图选择的方法:基于增量和重计算的物化视图选择算法IRMVS、基于增量策略的物化视图选择算法IMVS和基于重计算策略的物化视图选择算法RMVs和基于增量策略的物化后代视图选择算法IMDVS,理论分析和实验表明这些算法是有效可行的。  相似文献   

15.
常用OLAP查询优化方法性能分析   总被引:1,自引:0,他引:1  
张银玲  武彤 《微机发展》2014,(1):39-42,46
OLAP(OnlineAnalyticalProcessing)查询常常涉及到不同的维表和事实表,要得到查询结果通常需要进行多张表的连接操作。连接操作是一种非常耗时的操作,因此,如何提高OLAP查询效率成为数据仓库应用中的关键问题。文中对存储过程、索引技术、物化视图等几种常用的OLAP查询优化方法进行性能分析,针对特定应用通过反复实验比较得出物化视图的优越性。而就物化视图而言,其本身有优越性的同时也存在一些缺陷。因此,针对物化视图更新问题提出了几种更新方案。  相似文献   

16.
联机分析查询处理中的一种聚集算法   总被引:10,自引:2,他引:10  
联机分析处理(online analytical processing,简称OLAP)查询是涉及大量数据的即席复杂查询,从SQL(structured query language)角度来看,这些查询通常都包含多表连接和分组聚集操作.从OLAP查询处理角度出发,提出一种新的基于排序的聚集查询算法MuSA(sort-based aggregation with multi-table join).该方法充分考虑到数据仓库星型模式的特点,将聚集操作和新的多表连接算法MJoin相结合,排序时采用  相似文献   

17.
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.  相似文献   

18.
Web数据集成系统基于QC模型的物化视图选择   总被引:2,自引:0,他引:2  
在Web数据集成系统中,物化视图能够有效地减少网络传输代价,提高系统的查询效率.如何选择查询进行物化,使得选中的查询满足集成层的空间限制,同时获取最大物化收益,成为集成系统中一个迫切需要解决的问题.传统方法没有考虑到海量XML查询之间的包含关系,其选择的物化视图中可能包含冗余的信息.针对上述问题,提出了①Web数据集成系统中海量查询集合的QC(query containment)模型,该模型能够捕捉查询之间最常见的包含关系;②基于QC模型的物化视图选择算法,算法考虑了物化视图选择相关的主要因素,包括查询提交的频率、空间代价、查询重写能力和查询结果的完备性,提出了查询位图的物化视图组织方式,从而获取更加合理的物化视图选择方案.实验结果证明了该方法的有效性.  相似文献   

19.
This paper proposes a computation method for holistic multi-feature cube (MF-Cube) queries based on the characteristics of MF-Cubes. Three simple yet efficient strategies are designed to optimize the dependent complex aggregate at multiple granularities for a complex data-mining query within data cubes. One strategy is the computation of Holistic MF-Cube queries using the PDAP (Part Distributive Aggregate Property). More efficiency is gained by another strategy, that of dynamic subset data selection (the iceberg query technique), which reduces the size of the materialized data cubes. To extend this efficiency further, the second approach may adopt the chunk-based caching technique that reuses the output of previous queries. By combining these three strategies, we design an algorithm called the PDIC (Part Distributive Iceberg Chunk). We experimentally evaluate this algorithm using synthetic and real-world datasets and demonstrate that our approach delivers up to approximately twice the performance efficiency of traditional computation methods.  相似文献   

20.
We consider a workload of aggregate queries and investigate the problem of selecting materialized views that (1) provide equivalent rewritings for all the queries, and (2) are optimal, in that the cost of evaluating the query workload is minimized. We consider conjunctive views and rewritings, with or without aggregation; in each rewriting, only one view contributes to computing the aggregated query output. We look at query rewriting using existing views and at view selection. In the query-rewriting problem, we give sufficient and necessary conditions for a rewriting to exist. For view selection, we prove complexity results. Finally, we give algorithms for obtaining rewritings and selecting views.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号