共查询到20条相似文献,搜索用时 15 毫秒
1.
New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization 相似文献
2.
Yannis E. Ioannidis Raymond T. Ng Kyuseok Shim Timos K. Sellis 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(2):132-151
In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown
at query optimization time. Parametric query optimization attempts to identify at compile time several execution plans, each
one of which is optimal for a subset of all possible values of the run-time parameters. The goal is that at run time, when
the actual parameter values are known, the appropriate plan should be identifiable with essentially no overhead. We present
a general formulation of this problem and study it primarily for the buffer size parameter. We adopt randomized algorithms
as the main approach to this style of optimization and enhance them with a sideways information passing feature that increases their effectiveness in the new task. Experimental results of these enhanced algorithms show that they
optimize queries for large numbers of buffer sizes in the same time needed by their conventional versions for a single buffer
size, without much sacrifice in the output quality and with essentially zero run-time overhead.
Edited by S. Zdonik / Received June 1993 / Accepted April 1996 相似文献
3.
Grant J. Gryz J. Minker J. Raschid L. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(4):529-547
We present a technique for transferring query optimization techniques, developed for relational databases, into object databases. We demonstrate this technique for ODMG database schemas defined in ODL and object queries expressed in OQL. The object schema is represented using a logical representation (Datalog). Semantic knowledge about the object data model, e.g., class hierarchy information, relationship between objects, etc., as well as semantic knowledge about a particular schema and application domain are expressed as integrity constraints. An OQL object query is represented as a logic query and query optimization is performed in the Datalog representation. We obtain equivalent (optimized) logic queries, and subsequently obtain equivalent (optimized) OQL queries for each equivalent logic query. We present one optimization technique for semantic query optimization (SQO) based on the residue technique of U. Charavarthy et al. (1990; 1986; 1988). We show that our technique generalizes previous research on SQO for object databases. We handle a large class of OQL queries, including queries with constructors and methods. We demonstrate how SQO can be used to eliminate queries which contain contradictions and simplify queries, e.g., by eliminating joins, or by reducing the access scope for evaluating a query to some specific subclass(es). We also demonstrate how the definition of a method or integrity constraints describing the method, can be used in optimizing a query with a method 相似文献
4.
Karl Schnaitter Joshua Spiegel Neoklis Polyzotis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):521-542
A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most
relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators
that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order
to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that
an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized
physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the
base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly
into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization
of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the
performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques. 相似文献
5.
We studylazy structure sharing as a tool for optimizing equivalence testing on complex data types. We investigate a number of strategies for implementing lazy structure sharing and provide upper and lower bounds on their performance (how quickly they effect ideal configurations of our data structure). In most cases when the strategies are applied to a restricted case of the problem, the bounds provide nontrivial improvements over the naïve linear-time equivalence-testing strategy that employs no optimization. Only one strategy, however, which employs path compression, seems promising for the most general case of the problem.Work completed while at Princeton University and supported by a Fannie and John Hertz Foundation Fellowship, National Science Foundation Grant No. CCR-8920505, and the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) under NSF-STC-91-19999.Work completed while at Princeton University and DIMACS and supported by DIMACS under NSF-STC-91-19999.Research at Princeton University partially supported by the National Science Foundation, Grant No. CCR-8920505, the Office of Naval Research, Contract No. N00014-91-J-1463, and by DIMACS under NSF-STC-91-19999. 相似文献
6.
对SQL翻译成MapReduce程序的性能进行分析,并对影响翻译性能的原因进行阐述。结合MapReduce作业间输入相关性、数据转换相关性和作业流相关性的分析,通过合并冗余的作业,减少资源消耗,从而达到提高SQL查询性能的目的,给出了优化条件和优化规则。通过对优化前后的性能进行对比,证明改进后的SQL过程有更高的执行效率。 相似文献
7.
World Wide Web - Data partitioning is an effective way to reduce cost and improve query performance in large-scale Web data analytical applications. State-of-the-art partitioning approaches on... 相似文献
8.
为了提高分布式数据库管理系统的查询效率,分析了分布式数据库管理系统的特点,找出了影响分布式数据库管理系统查询效率的关键因素,讨论了直接连接查询的常见策略和半连接查询的原理、实现方法以及所花费的传输代价,最后结合分布式数据库管理系统的具体实例提出了一种半连接查询策略。改进后的半连接查询策略优化了连接方案,降低了数据传输过程的成本,缩短了查询处理的响应时间,提高了查询操作的效率。 相似文献
9.
Jean-Robert Gruser Louiqa Raschid Vladimir Zadorozhny Tao Zhan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):18-37
Abstract. The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a wrapper cost model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present a Web prediction tool (WebPT), a tool that is based on learning using query feedback from WebSources. The WebPT uses dimensions time of day, day, and quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay) for some query. Experiment data was collected from several sources, and those dimensions that were significant in estimating the response time were determined. We then trained the WebPT on the collected data, to use the three dimensions mentioned above, and to predict the response time, as well as a confidence in the prediction. We describe the WebPT learning algorithms, and report on the WebPT learning for WebSources. Our research shows that we can improve the quality of learning by tuning the WebPT features, e.g., training the WebPT using a logarithm of the input training data; including significant dimensions in the WebPT; or changing the ordering of dimensions. A comparison of the WebPT with more traditional neural network (NN) learning has been performed, and we briefly report on the comparison. We then demonstrate how the WebPT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the WebPT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where WebPT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and WebPT learning, we test how good the WebPT is in minimizing these penalties. Received June 22, 1999 / Accepted December 24, 1999 相似文献
10.
Semantic query optimization, or knowledge-based query optimization, has received increasing interest in recent years. The authors provide an effective and systematic approach to optimizing queries by appropriately choosing semantically equivalent transformations. Basically, there are two different types of transformations: transformations by eliminating unnecessary joins, and transformations by adding/eliminating redundant beneficial/nonbeneficial selection operations (restrictions). A necessary and sufficient condition to eliminate a single unnecessary join is provided. We prove that it is 𝒩𝒫-𝒞omplete to eliminate as many unnecessary joins as possible for various types of acyclic queries with the exception of the closure chain queries whose query graphs are chains and all equi-join attributes are distinct. An algorithm is provided to minimize the number of joins in tree queries. This algorithm has an important property that, when applied to a closure chain query, it will yield an optimal solution with the time complexity O(n*m), where n is the number of relations referenced in the chain query, and m is the time complexity of a restriction closure computation 相似文献
11.
12.
为优化数据迁移对多数据源关联查询性能的影响,提出一个多数据源的关联查询优化模型(multi-source association query optimization model,MAQM),使用包装器对需要查询的存储系统进行包装,为用户提供统一的多数据源关联查询接口;提出区域划分策略,以存储系统的关系表为划分粒度,构... 相似文献
13.
The problem of finding an optimal semijoin sequence that fully reduces a given tree query is discussed. A method is presented that intelligently navigates the space of all semijoin sequences and returns an optimal solution. Experiments are reported that show that this method performs very efficiently: on average, less than 5% of the search space is searched before an optimal solution is found. Other advantages of the method are ease of implementation, generality of the cost mode considered, and ability to handle tree queries with arbitrary target lists 相似文献
14.
Tommaso Urli 《Constraints》2015,20(4):473-473
15.
Shekar S. Hamidzadeh B. Kohli A. Coyle M. 《Knowledge and Data Engineering, IEEE Transactions on》1993,5(6):950-964
An approach to learning query-transformation rules based on analyzing the existing data in the database is proposed. A framework and a closure algorithm for learning rules from a given data distribution are described. The correctness, completeness, and complexity of the proposed algorithm are characterized and a detailed example is provided to illustrate the framework 相似文献
16.
传统的查询树及基于多重加权树的查询优化方法,研究得比较成熟,语义查询优化方法将一个查询变换成一个或数个语义等价的查询,基于Agent的并行数据库查询优化采用Multi-Agent技术自动查找与给定查询有关的完整性约束条件,使得多个关系间连接操作的效率得到很大地提高.并行数据库的查询优化领域的3个重要方向为基于机群系统的并行数据库查询优化研究,将MAS技术及专家系统引入本领域,将模拟退火算法及神经网络算法引入本领域. 相似文献
17.
为了应用智能化的方法提高数据库访问效率,基于多Agent技术构建了分布式数据库访问平台,研究并解决了平台的结构、各种Agent的设计、Agent间的协作机制、以及数据库系统的包装方法等关键问题.在优化策略方面,研究了分布式环境下的语义缓存技术,并提出了一种Agent平台下的智能预取算法,弥补了传统数据库优化手段缺乏智能性、预动性,以及重用困难等不足.通过在大型数据库系统上进行测试,表明该方案在进行大规模数据库操纵时效率有明显提高. 相似文献
18.
19.
Chihping Wang Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(4):650-662
While a significant amount of research efforts has been reported on developing algorithms, based on joins and semijoins, to tackle distributed query processing, there is relatively little progress made toward exploring the complexity of the problems studied. As a result, proving NP-hardness of or devising polynomial-time algorithms for certain distributed query optimization problems has been elaborated upon by many researchers. However, due to its inherent difficulty, the complexity of the majority of problems on distributed query optimization remains unknown. In this paper we generally characterize the distributed query optimization problems and provide a frame work to explore their complexity. As it will be shown, most distributed query optimization problems can be transformed into an optimization problem comprising a set of binary decisions, termed Sum Product Optimization (SPO) problem. We first prove SPO is NP-hard in light of the NP-completeness of a well-known problem, Knapsack (KNAP). Then, using this result as a basis, we prove that five classes of distributed query optimization problems, which cover the majority of distributed query optimization problems previously studied in the literature, are NP-hard by polynomially reducing SPO to each of them. The detail for each problem transformation is derived. We not only prove the conjecture that many prior studies relied upon, but also provide a frame work for future related studies 相似文献
20.
Chen Rongxin Wang Zhijin Su Hang Xie Shutong Wang Zongyue 《The Journal of supercomputing》2022,78(4):5420-5449
The Journal of Supercomputing - The performance of XPath query is the key factor to the capacity of XML processing. It is an important way to improve the performance of XPath by making full use of... 相似文献