首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Answering queries using views: A survey   总被引:25,自引:0,他引:25  
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results. Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001  相似文献   

2.
Web数据集成系统基于QC模型的物化视图选择   总被引:2,自引:0,他引:2  
在Web数据集成系统中,物化视图能够有效地减少网络传输代价,提高系统的查询效率.如何选择查询进行物化,使得选中的查询满足集成层的空间限制,同时获取最大物化收益,成为集成系统中一个迫切需要解决的问题.传统方法没有考虑到海量XML查询之间的包含关系,其选择的物化视图中可能包含冗余的信息.针对上述问题,提出了①Web数据集成系统中海量查询集合的QC(query containment)模型,该模型能够捕捉查询之间最常见的包含关系;②基于QC模型的物化视图选择算法,算法考虑了物化视图选择相关的主要因素,包括查询提交的频率、空间代价、查询重写能力和查询结果的完备性,提出了查询位图的物化视图组织方式,从而获取更加合理的物化视图选择方案.实验结果证明了该方法的有效性.  相似文献   

3.
Designing data warehouses   总被引:9,自引:0,他引:9  
A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented.  相似文献   

4.
实视图选择问题是数据仓库研究的重要问题之一。数据仓库存储实视图主要为OLAP查询,用户查询响应时间是首要考虑的问题,提出了查询代价视图选择问题,给出了其代价模型。提出了对查询代价视图选择问题利用遗传算法来解决的方法和策略。经实验证明,该算法达到了良好的效果,效率高。  相似文献   

5.
In this paper, we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments. Part of this article was published elsewhere [Chirkova, R., Li, C.: Materializing views with minimal size to answer queries. PODS (2003)]. In addition to the prior materials, this article contains new theoretical results, as well as new results on how to efficiently implement the proposed techniques (Sects. 5 and 5.4)  相似文献   

6.
数据集成中XML数据查询语义重写   总被引:10,自引:0,他引:10  
查询重写是数据库研究的一个基本问题,它和查询优化,数据仓库,数据集成,语义缓存等数据库问题密切相关,为提高集成系统的查询效率,系统选择提交频率较高的XML查询物化为中间层视图,用户提交查询后,系统尽可能利用中间视图层中视图,而不是访问数据源来回答查询,这个问题实际可以归结为半结构化查询重写问题,考虑到中间视图层空间的有限性,已有视图应当尽可能回答更多的查询,传统查询重写方法有考虑半结构化数据之间的约束,而根据约束可以等价变换查询,从而提高中间视图层中的表达能力,提出了一种新的半结构化查询重写的方法,该方法在保证算法正确性和完备性的基础上,利用上半结构化数据中的约束,尤其是XML文件中的路径依赖,来增强中间层物化视图的表达能力,理论分析和初步原型实验证明方法的有效性。  相似文献   

7.
Selection of views to materialize in a data warehouse   总被引:4,自引:0,他引:4  
A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In This work, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.  相似文献   

8.
Given a query workload, a database and a set of constraints, the view-selection problem is to select views to materialize so that the constraints are satisfied and the views can be used to compute the queries in the workload efficiently. A typical constraint, which we consider in the present work, is to require that the views can be stored in a given amount of disk space. Depending on features of SQL queries (e.g., the DISTINCT keyword) and on whether the database relations on which the queries are applied are sets or bags, the queries may be computed under set semantics, bag-set semantics, or bag semantics. In this paper we study the complexity of the view-selection problem for conjunctive queries and views under these semantics. We show that bag semantics is the “easiest to handle” (we show that in this case the decision version of view selection is in NP), whereas under set and bag-set semantics we assume further restrictions on the query workload (we only allow queries without self-joins in the workload) to achieve the same complexity. Moreover, while under bag and bag-set semantics filtering views (i.e., subgoals that can be dropped from the rewriting without impacting equivalence to the query) are practically not needed, under set semantics filtering views can reduce significantly the query-evaluation costs. We show that under set semantics the decision version of the view-selection problem remains in NP only if filtering views are not allowed in the rewritings. Finally, we investigate whether the cgalg algorithm for view selection introduced in Chirkova and Genesereth (Linearly bounded reformulations of conjunctive databases, pp. 987–1001, 2000) is suitable in our setting. We prove that this algorithm is sound for all cases we examine here, and that it is complete under bag semantics for workloads of arbitrary conjunctive queries and under bag-set semantics for workloads of conjunctive queries without self-joins. Rada Chirkova’s work on this material has been supported by the National Science Foundation under Grant No. 0307072. The project is co-funded by the European Social Fund (75%) and National Resources (25%)- Operational Program for Educational and Vocational Training II (EPEAEK II) and particularly the program PYTHAGORAS. A preliminary version of this paper appears in F. Afrati, R. Chirkova, M. Gergatsoulis, V. Pavlaki. Designing Views to Efficiently Answer Real SQL Queries. In Proc. of SARA 2005, LNAI Vol. 3607, pages 332-346, Springer-Verlag, 2005.  相似文献   

9.
The index selection problem (ISP) concerns the selection of an appropriate index set to minimize the total cost for a given workload containing read and update queries. Since the ISP has been proven to be an NP-hard problem, most studies focus on heuristic algorithms to obtain approximate solutions. However, even approximate algorithms still consume a large amount of computing time and disk space because these systems must record all query statements and frequently request from the database optimizers the cost estimation of each query in each considered index. This study proposes a novel algorithm without repeated optimizer estimations. When a query is delivered to a database system, the optimizer evaluates the costs of various query plans and chooses an access path for the query. The information from the evaluation stage is aggregated and recorded with limited space. The proposed algorithm can recommend indexes according to the readily available information without querying the optimizer again. The proposed algorithm was tested in a PostgreSQL database system using TPC-H data. Experimental results show the effectiveness of the proposed approach.  相似文献   

10.
物化视图能够有效地提高空间数据仓库的查询效率,但由于空间操作的复杂性,传统数据仓库中物化视图的选择算法不能很好地应用于空间数据仓库。为了在存储空间约束下选择查询进行物化,并动态调整物化视图集,以适应用户查询的时变性和即席查询,提出了空间物化视图选择算法SMVS。实验结果表明该算法是有效可行的,不仅能够提高查询性能,而且解决了查询响应性能随用户查询分布变化而下降的问题。  相似文献   

11.
A data warehouse (DW) can be seen as a set of materialized views defined over remote base relations. When a query is posed, it is evaluated locally, using the materialized views, without accessing the original information sources. The DWs are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered by them. Some of these queries can be answered using exclusively the materialized views. In general though new views need to be added to the DW.In this paper we investigate the problem of incrementally designing a DW when new queries need to be answered and possibly extra space is allocated for view materialization. Based on an AND/OR dag representation of multiple queries, we model the problem as a state space search problem. We design incremental algorithms for selecting a set of new views to additionally materialize in the DW that: (a) fits in the extra space, (b) allows a complete rewriting of the new queries over the materialized views, and (c) minimizes the combined new query evaluation and new view maintenance cost. Finally, we discuss methods for pruning the search space so that efficiency is improved.  相似文献   

12.
由深度学习驱动的学习型查询优化器正在越来越广泛地受到研究者的关注,这些优化器往往能够取得近似甚至超过传统商业优化器的性能.与传统优化器不同的是,一个成功的学习型优化器往往依赖于足够多的高质量的负载查询作为训练数据.低质量的训练查询会导致学习型优化器在未来的查询上失效.提出了基于强化学习的鲁棒的学习型查询优化器训练框架A...  相似文献   

13.
半结构化数据库没有固定的库模式,用户对其结构难以产生清晰的认识,从而无法有效地查询所需的内容.提出了一种基于本体的柔性查询,用户通过了解数据库本体语义信息而发出的查询不必遵循严格的数据库模式也能得出结果.由于在半结构化数据库上直接查找效率很低,故在其上生成描述结构模式的概念本体库.查询模块先在本体库上评估能否得出查询结果,再在数据库上执行查询.然而由于本体库可能是图的形式,其查询代价仍然很高,本质上是NP问题,进一步研究了将图转化为树的方法,并给出了相应的算法.  相似文献   

14.
在数据仓库中,如何选择实例化视图是一个重要的问题。针对一类特定的数据立方体,该文提出了一个基于代价策略的实例化视图选择算法。通过对一个实际数据集的分析,发现在数据立方体中有很多父子视图具有相同的体积,其原因是用于产生数据立方体的基本关系的属性之间存在着密切的联系。显然,对这类数据立方体不能像算法PBS那样按照体积的大小来选择要实例化的视图。为此,设计了算法PBC,不但可以快速地给出满足条件的实例化视图集,而且可以准确地找到具有最短平均响应时间的实例化视图集,避免了在用户给出过大的参数时,实例化一些无益于缩短查询响应时间的视图。实验结果表明,算法PBC是有效的。  相似文献   

15.
The GMAP: a versatile tool for physical data independence   总被引:1,自引:0,他引:1  
Physical data independence is touted as a central feature of modern database systems. It allows users to frame queries in terms of the logical structure of the data, letting a query processor automatically translate them into optimal plans that access physical storage structures. Both relational and object-oriented systems, however, force users to frame their queries in terms of a logical schema that is directly tied to physical structures. We present an approach that eliminates this dependence. All storage structures are defined in a declarative language based on relational algebra as functions of a logical schema. We present an algorithm, integrated with a conventional query optimizer, that translates queries over this logical schema into plans that access the storage structures. We also show how to compile update requests into plans that update all relevant storage structures consistently and optimally. Finally, we report on experiments with a prototype implementation of our approach that demonstrate how it allows storage structures to be tuned to the expected or observed workload to achieve significantly better performance than is possible with conventional techniques. Edited by Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received September 15, 1994 / Accepted September 1, 1995  相似文献   

16.
The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,1 i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database.  相似文献   

17.
Query rewriting using views is a technique that allows a query to be answered efficiently by using pre-computed materialized views. It has many applications, such as data caching, query optimization, schema integration, etc. This issue has been studied extensively for relational databases and, as a result, the technology is maturing. For XML data, however, the work is inadequate. Recently, several frameworks have been proposed for query rewriting using views for XPath queries, with the requirement that a rewriting must be complete. In this paper, we study the problem of query rewriting using views for XPath queries without requiring that the rewriting be complete. This will increase its applicability since in many cases, complete rewritings using views do not exist. We give formal definitions for various concepts to formulate the problem, and then propose solutions. Our solutions are built under the framework for query containment. We look into the problem from both theoretic perspectives, and algorithmic approaches. Two methods to generate rewritings using views are proposed, with different characteristics in terms of generalities and efficiencies. The maximality properties of the rewritings generated by these methods are discussed.  相似文献   

18.
An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same database management system connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. An implementation of our run-time optimization technique for join queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probe-based optimization over a static optimization. Received 6 February 1999 / Revised 15 February 2000 / Accepted 10 May 2000  相似文献   

19.
We consider a workload of aggregate queries and investigate the problem of selecting materialized views that (1) provide equivalent rewritings for all the queries, and (2) are optimal, in that the cost of evaluating the query workload is minimized. We consider conjunctive views and rewritings, with or without aggregation; in each rewriting, only one view contributes to computing the aggregated query output. We look at query rewriting using existing views and at view selection. In the query-rewriting problem, we give sufficient and necessary conditions for a rewriting to exist. For view selection, we prove complexity results. Finally, we give algorithms for obtaining rewritings and selecting views.  相似文献   

20.
Recently, learned query optimizers typically driven by deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art commercial optimizers. A successful learning optimizer often relies on enough high-quality load queries as training data, and poor-quality training will lead to the query failure of learned query optimizers. In this paper, we propose a novel training framework AlphaQO for robust learned query optimizers based on Reinforcement Learning (RL), and the robustness of the optimizers can be improved by finding the bad queries in advance. AlphaQO is a loop system consisting of two main components, namely the query generator and the learned optimizer. A query generator aims at generating ``difficult'' queries (i.e., queries that the learned optimizer provides poor estimates). The learned optimizer will be trained using these generated queries, as well as providing feedback (in terms of numerical rewards) to the query generator for updates. If the generated queries are good, the query generator will get a high reward; otherwise, the query generator will get a low reward. The above process is performed iteratively, with the main goal that within a small budget, the learned optimizer can be trained and generalized well to a wide range of unseen queries. Extensive experiments show that AlphaQO can generate a relatively small number of queries and train a learned optimizer to outperform commercial optimizers. Moreover, learned optimizers require much fewer queries from AlphaQO than randomly generated queries for the quality training of the learned optimizer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号