首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 36 毫秒
1.
View materialization is an important way of improving the performance of query processing. When an update occurs to the source data from which a materialized view is derived, the materialized view has to be updated so that it is consistent with the source data. This update process is called view maintenance. The incremental method of view maintenance, which computes the new view using the old view and the update to the source data, is widely preferred to full view recomputation when the update is small in size. In this paper we investigate how to incrementally maintain views in object-relational (OR) databases. The investigation focuses on maintaining views defined in OR-SQL, a language containing the features of object referencing, inheritance, collection, and aggregate functions including user-defined set aggregate functions. We propose an architecture and algorithms for incremental OR viewmaintenance. We implement all algorithms and analyze the performance of them in comparison with full view recomputation. The analysis shows that the algorithms significantly reduce the cost of updating a vieww hen the size of an update to the source data is relatively small. Received 23 May 2000 / Revised 27 March 2001 / Accepted in revised form 30 April 2001 Correspondence and offprint requests to: Jixue Liu, School of Computer and Information Science, University of South Australia, Mawson Lakes, Adelaide SA5084, Australia. Email: jixue.liu@unisa.edu.auau  相似文献   

2.
数据仓库中时态视图的维护   总被引:5,自引:0,他引:5  
李琪  白英彩 《软件学报》2002,13(7):1324-1330
数据仓库的一个重要用途是利用时态视图向用户提供历史信息.因为在传统关系数据模型中增加了对时间的支持,而且时态视图的更新不仅来自于基表更新,还包括时间前进,所以,目前对非时态视图维护的研究成果不适用于时态视图,并且已有的一些时态视图维护算法也不适用于数据仓库.以历史关系模式为对象,根据增量式维护方法的原理,采用纯删除、纯插入的计算方法,用代数语言给出了5种基本历史关系代数运算的更新传播算法,由这5种历史关系代数组合定义的时态视图都可用迭代方法得到其增量维护计算式.所采用的纯删除、纯插入思想也可移用于其他历史  相似文献   

3.
对于定义在若干基本表上的物化视图,当基本表发生变化时,物化视图也需要相应地更新,如何有效地进行物化视图的增量保持是一个非常重要的问题。文章提出了一种在O(nlogn)时间内构造最优Delta传播树的二分贪心算法,并给出了算法正确性证明。  相似文献   

4.
Selection of views to materialize in a data warehouse   总被引:4,自引:0,他引:4  
A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In This work, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.  相似文献   

5.
In thedynamic dictionary matchingproblem, a dictionaryDcontains a set of patterns that can change over time under insertion and deletion of individual patterns. Given an arbitrary textT, we must efficiently list all the dictionary patterns that occur at each text position. We investigate the I/O complexity of this problem for a large dictionary that must be stored in external storage devices. By following a completely new approach, we devise an efficient solution which is based upon the SB-tree data structure (P. Ferragina and R. Grossi, 1995,in“Proc. ACM Symposium on Theory of Computing,” pp. 693–702), and a novel notion of certificate for the dictionary matching problem. Our data structure can be adapted to efficiently work in main memory and to solve other problems, thus providing a new insight into the nature of the dictionary matching problem.  相似文献   

6.
In a distributed environment, materialized views are used to integrate data from different information sources and then store them in some centralized location. In order to maintain such materialized views, maintenance queries need to be sent to information sources by the data warehouse management system. Due to the independence of the information sources and the data warehouse, concurrency issues are raised between the maintenance queries and the local update transactions at each information source. Recent solutions such as ECA and Strobe tackle such concurrent maintenance, however with the requirement of quiescence of the information sources. SWEEP and POSSE overcome this limitation by decomposing the global maintenance query into smaller subqueries to be sent to every information source and then performing conflict correction locally at the data warehouse. Note that all these previous approaches handle the data updates one at a time. Hence either some of the information sources or the data warehouse is likely to be idle during most of the maintenance process. In this paper, we propose that a set of updates should be maintained in parallel by several concurrent maintenance processes so that both the information sources as well as the warehouse would be utilized more fully throughout the maintenance process. This parallelism should then improve the overall maintenance performance. For this we have developed a parallel view maintenance algorithm, called PVM, that substantially improves upon the performance of previous maintenance approaches by handling a set of data updates at the same time. The parallel handling of a set of updates is orthogonal to the particular maintenance algorithm applied to the handling of each individual update. In order to perform parallel view maintenance, we have identified two critical issues that must be overcome: (1) detecting maintenance-concurrent data updates in a parallel mode and (2) correcting the problem that the data warehouse commit order may not correspond to the data warehouse update processing order due to parallel maintenance handling. In this work, we provide solutions to both issues. For the former, we insert a middle-layer timestamp assignment module for detecting maintenance-concurrent data updates without requiring any global clock synchronization. For the latter, we introduce the negative counter concept to solve the problem of variant orders of committing effects of data updates to the data warehouse. We provide a proof of the correctness of PVM that guarantees that our strategy indeed generates the correct final data warehouse state. We have implemented both SWEEP and PVM in our EVE data warehousing system. Our performance study demonstrates that a manyfold performance improvement is achieved by PVM over SWEEP.Received: 12 November 2001, Accepted: 18 December 2002, Published online: 31 July 2003This work was supported in part by the NSF NYI grant IIS-979624 and NSF CISE Instrumentation grant IRIS 97-29878 and NSF grant IIS-9988776.  相似文献   

7.
Answering queries using views: A survey   总被引:25,自引:0,他引:25  
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results. Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001  相似文献   

8.
数据仓库中多数据源物化视图的一种有效更新算法   总被引:4,自引:0,他引:4  
数据仓库中存储着大量的汇总数据以支持查询和相关决策的制定,这些汇总数据常常是定义在若干数据源上的物化视图.当数据源发生变化时,物化视图也需要相应的更新,这必然给数据仓库带来庞大的开销,因而如何有效地对物化视图进行更新成为一个非常重要的问题.利用BinPartition算法可以使计算费用达到最低,随后证明了该算法的正确性并分析了其时间复杂性.  相似文献   

9.
基于多维护策略的物化视图选择方法   总被引:1,自引:0,他引:1  
物化视图是数据仓库环境中提高OLAP查询效率的重要手段,因此,物化视图的选择是数据仓库设计中重要的决策之一。本文提出的物化视图选择方法目标是选择合适的视图进行物化,使得查询处理的总代价和物化视图的维护代价最低,提出了物化视图收益模型,并在此基础上基于视图的多维护策略提出了物化视图选择的方法:基于增量和重计算的物化视图选择算法IRMVS、基于增量策略的物化视图选择算法IMVS和基于重计算策略的物化视图选择算法RMVs和基于增量策略的物化后代视图选择算法IMDVS,理论分析和实验表明这些算法是有效可行的。  相似文献   

10.
数据仓库的维护是数据仓库应用中的一个十分重要的问题,近几年产生了很多的维护算法。已有的维护算法多是针对单个实化视图的维护;或只针对简单SPJ视图的维护;或只针对聚集函数的维护;而实际的数据仓库大多是由包含聚集函数的多个实化视图组成。因此综合考虑包含聚集函数的多个实化视图的维护问题是必然的。文章正是在此情况下提出了一种基于多实化视图增量维护的基库生成算法,在《基于基库的多实化视图增量维护算法》中提出了包含聚集函数的多实化视图的维护算法。  相似文献   

11.
Emerging applications such as personalized portals, enterprise search, and web integration systems often require keyword search over semi-structured views. However, traditional information retrieval techniques are likely to be expensive in this context because they rely on the assumption that the set of documents being searched is materialized. In this paper, we present a system architecture and algorithm that can efficiently evaluate keyword search queries over virtual (unmaterialized) XML views. An interesting aspect of our approach is that it exploits indices present on the base data and thereby avoids materializing large parts of the view that are not relevant to the query results. Another feature of the algorithm is that by solely using indices, we can still score the results of queries over the virtual view, and the resulting scores are the same as if the view was materialized. Our performance evaluation using the INEX data set in the Quark (Bhaskar et al. in Quark: an efficient XQuery full-text implementation. In: SIGMOD, 2006) open-source XML database system indicates that the proposed approach is scalable and efficient.  相似文献   

12.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.  相似文献   

13.
The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2n cuboids, the conventional methods compute 2ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2n delta cuboids. We formulate an optimization problem to find the optimal subset of 2n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.  相似文献   

14.
In this paper we present results on the problem of maintaining materialized top-k views and provide results in two directions. The first problem we tackle concerns the maintenance of top-k views in the presence of high deletion rates. We provide a principled method that complements the inefficiency of the state of the art independently of the statistical properties of the data and the characteristics of the update streams. The second problem we have been concerned with has to do with the efficient maintenance of multiple top-k views in the presence of updates to their base relation. To this end, we provide theoretical guarantees for the nucleation (practically, inclusion) of a view with respect to another view and the reflection of this property to the management of updates. We also provide algorithmic results towards the maintenance of a large number of views, via their appropriate structuring in hierarchies of views.  相似文献   

15.
We present a method of describing microprocessors at different levels of temporal and data abstraction, and consider pipelined and superscalar processors. We model microprocessors using iterated maps, defined by equations which evolve a system from an initial state by the iterative application of a next-state function. Levels of timing abstraction are related by temporal abstraction maps called retimings. We state correctness conditions for microprogrammed, pipelined and superscalar processors. We introduce one-step theorems that permit verification of correctness conditions to be considerably simplified under well-defined conditions. We extend the one-step theorems to superscalar microprocessors. Received November 1998 / Accepted in revised form March 2000  相似文献   

16.
We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.  相似文献   

17.
在数据仓库的实化视图维护处理中,如何有效地处理并发更新是一个重要而又棘手的问题.文中阐述了P2P环境下模式与数据全面并发的典型情形,分析了因并发更新而导致视图维护异常的原因,针对这些不同的方面提出相应的纠正策略.给出了一种基于时态演算的并发更新侦测方法,以及混合更新下对关联更新进行检测的有效算法,最后提出了解决乱序提交问题的增强代理机制,确保了数据仓库与数据源的一致性.  相似文献   

18.
Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive queries. Performance experiments are reported that evaluate pertinent aspects of the index. Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001  相似文献   

19.
In a mobile environment, querying a database at a stationary server from a mobile client is expensive due to the limited bandwidth of a wireless channel and the instability of the wireless network. We address this problem by maintaining a materialized view in a mobile client's local storage. Such a materialized view can be considered as a data warehouse. The materialized view contains results of common queries in which the mobile client is interested. In this paper, we address the view update problem for maintaining the consistency between a materialized view at a mobile client and the database server. The content of a materialized view could become incoherent with that at the database server when the content of the database server and/or when the location of the client is changed. Existing view update mechanisms are ‘push-based’. The server is responsible for notifying all clients whose views might be affected by the changes in database or the mobility of the client. This is not appropriate in a mobile environment due to the frequent wireless channel disconnection. Furthermore, it is not easy for a server to keep track of client movements to update individual client location-dependent views. We propose a ‘pull-based’ approach that allows a materialized view to be updated at a client in an incremental manner, requiring a client to request changes to its view from the server. We demonstrate the feasibility of our approach with experimental results. Received 27 January 1999 / Revised 26 November 1999 / Accepted 17 April 2000  相似文献   

20.
Materialized views defined over distributed data sources can be utilized by many applications to ensure better access, reliable performance, and high availability. Technology for maintaining materialized views is thus critical for providing up-to-date results since a stale view extent may not help or even mislead these applications. State-of-the-art incremental view maintenance requires O(n2)O(n2) or more remote maintenance queries with n being the number of data sources in the view definition. In this work, we propose two novel maintenance strategies, namely adjacent grouping and conditional grouping, that dramatically reduce the number of maintenance queries required to maintain the materialized views. This reduction in the number of maintenance queries brings the basic trade-off between the complexity of each query and the total number of maintenance queries that can be exploited to improve maintenance performance. The proposed maintenance strategies have been implemented in a working prototype system called TxnWrap. Experimental studies illustrate that our proposed strategies are able to achieve about 400% performance improvement in terms of total processing time compared with existing batch algorithms in a majority of cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号