期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Consistent selectivity estimation via maximum entropy

V. Markl P. J. Haas M. Kutsch N. Megiddo U. Srivastava T. M. Tran 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(1):55-76

Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use of partial distribution information raises the possibility that multiple, non-equivalent selectivity estimates may be available for a given predicate. Current optimizers use cumbersome ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the optimizer toward query plans for which the least information is available, often yielding poor results. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Experiments with our prototype implementation in DB2 UDB show that use of the ME approach can improve the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times. For almost all queries, these improvements are obtained while adding only tens of milliseconds to the overall time required for query optimization. 相似文献

2.

一种新的关系数据库查询优化方法 总被引：1，自引：0，他引：1

伍军云徐少平林振荣王炜立《计算机与现代化》2006,(7):33-35

现代关系数据库查询优化器通常根据查询代价评估不同查询计划的执行效率,对查询计划中产生的中间结果集的错误预测是造成优化器效率低下的主要原因。为了解决这个问题,本文介绍一种新的SPS（Statistics Predict Set）查询优化方法。该方法能够有效地解决这方面的问题。相似文献

3.

基于增量局部加权学习的查询模板自适应基数估计

冯杰明李战怀《计算机学报》2022,45(1):17-34

基数估计是基于代价查询优化的关键步骤,已经被研究了近40年.传统方法如基于直方图的方法在一些假设如属性相互独立、相交的表满足包含原则等成立时能基本满足准确性要求.然而,在真实运行环境中这些假设往往不再成立,可能导致基数估计严重错误进而造成查询延迟.近年来,随着数据的增多和新硬件的发展,使用机器学习方法来提高基数估计的质量成为了可能.由于基于代价的查询优化主要根据查询中子执行计划的估计代价来选择最优的查询执行计划,因此,有一些最近的工作针对一些关键的子执行计划模板建立相应的局部学习模型,取得了不错的进展.但是,这些局部模型主要用于查询(查询空间)分布和数据(数据库数据)分布不变的场景,而在真实运行环境中,它们往往不断地发生变化,限制了这些估计技术的有效性.在本文中,我们针对子执行计划模板在查询分布和数据分布不断变化的环境下提出了一种使用增量的局部加权学习进行自适应基数估计的方法.具体地说,首先抽取子执行计划的语义和统计特征使之能代表当前查询和数据的特性,然后使用增量的局部加权学习模型根据查询分布和数据分布的变化进行自适应的学习,实现基数估计.最后,通过对比实验验证了本文方法的有效性. 相似文献

4.

并行查询下查询执行计划的选择

裴泽锋牛保宁张锦文 Amjad Muhammad 《计算机应用》2020,40(2):420-425

查询是数据库系统的主要负载,其效率决定了数据库性能的好坏。一个查询存在多种执行计划,当前,查询优化器只能按照数据库系统的配置参数,静态地为查询选择一个较优的执行计划。并行查询间存在复杂多变的资源争用,很难通过配置参数准确反映,而且同一执行计划在不同情景下的效率并不一致。并行查询下执行计划的选择需考虑查询间的相互影响——查询交互。基于此,提出了一种在并行查询下度量查询受查询交互影响大小的标准QIs。针对并行查询下查询执行计划的选择,还提出了一种动态地为查询选择执行计划的方法TRating,该方法通过比较查询组合中按不同执行计划执行的查询受查询交互影响的大小,选择受查询交互影响较小的执行计划作为该查询的较优执行计划。实验结果表明,TRating方法为查询选择较优执行计划的准确率达61%,相比查询优化器提高了25%;而且在为查询选择次优执行计划时,其准确率也高达69%。相似文献

5.

Which sort orders are interesting?

Ravindra Guravannavar S. Sudarshan Ajit A. Diwan Ch. Sobhan Babu 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(1):145-165

Sort orders play an important role in query evaluation. Algorithms that rely on sorting are widely used to implement joins, grouping, duplicate elimination and other set operations. The notion of interesting orders has allowed query optimizers to consider plans that could be locally sub-optimal, but produce ordered output beneficial for other operators, and thus be part of a globally optimal plan. However, the number of interesting orders for most operators is factorial in the number of attributes involved. Optimizer implementations use heuristics to prune the number of interesting orders, but the quality of the heuristics is unclear. Increasingly complex decision support queries and increasing use of query-covering indices, which provide multiple alternative sort orders for relations, motivate us to better address the problem of choosing interesting orders. We show that even a simplified version of the problem is NP-hard and provide a 1/2-benefit approximation algorithm for a special case of the problem. We then present principled heuristics for the general case of choosing interesting orders. We have implemented the proposed techniques in a Volcano-style cost-based optimizer, and our performance study shows significant improvements in estimated cost. We also executed our plans on a widely used commercial database system, and on PostgreSQL, and found that actual execution times for our plans were significantly better than for plans generated by those systems in several cases. 相似文献

6.

基于树型门控循环单元的基数和代价估计器

乔少杰杨国平韩楠屈露露陈浩毛睿元昌安 Louis Alberto GUTIERREZ 《软件学报》2022,33(3):797-813

基数估计和代价估计可以引导执行计划的选择,估计准确性对查询优化器至关重要.然而,传统数据库的代价和基数估计技术无法提供准确的估计,因为现有技术没有考虑多个表之间的相关性.将人工智能技术应用于数据库(artificial intelligence for databases, AI4DB)近期得到广泛关注,研究结果表明,基于学习的估计方法优于传统方法.然而,现有基于学习的方法仍然存在不足:首先,大部分的方法只能估计基数,但忽略了代价估计;其次,这些方法只能处理一些简单的查询语句,对于多表查询、嵌套查询等复杂查询则无能为力;同时,对字符串类型的值也很难处理.为了解决上述问题,提出了一种基于树型门控循环单元, Tree-GRU (tree-gated recurrent unit)的基数和代价估计方法,可以同时对基数和代价进行估计.此外,采用了有效的特征提取和编码技术,在特征提取中兼顾查询和执行计划,将特征嵌入到Tree-GRU中.对于字符串类型的值,使用神经网络自动提取子串与整串的关系,并进行字符串嵌入,从而使具有稀疏性的字符串变得容易被估计器处理.在JOB、Synthetic等数据集上进... 相似文献

7.

Optimizing large join queries using a graph-based approach 总被引：4，自引：0，他引：4

Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(2):298-315

Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the older of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for internal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances 相似文献

8.

XHQE: A hybrid system for scalable selectivity estimation of XML queries

E.-S. M. El-Alfy S. Mohammed A. F. Barradah 《Information Systems Frontiers》2016,18(6):1233-1249

With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints. 相似文献

9.

A sampling approach for skyline query cardinality estimation

Cheng Luo Zhewei Jiang Wen-Chi Hou Shan He Qiang Zhu 《Knowledge and Information Systems》2012,32(2):281-301

A skyline query returns a set of candidate records that satisfy several preferences. It is an operation commonly performed to aid decision making. Since executing a skyline query is expensive and a query plan may combine skyline queries with other data operations such as join, it is important that the query optimizer can quickly yield an accurate cardinality estimate for a skyline query. Log Sampling (LS) and Kernel-Based (?KB) skyline cardinality estimation are the two state-of-the-art skyline cardinality estimation methods. LS is based on a hypothetical model A(log(n))^B. Since this model is originally derived under strong assumptions like data independence between dimensions, it does not apply well to an arbitrary data set. Consequently, LS can yield large estimation errors. KB relies on the integration of the estimated probability density function (PDF) to derive the scale factor ??_ds. As the estimation of PDF and the ensuing integration both involve complex mathematical calculations, KB is time consuming. In view of these problems, we propose an innovative purely sampling-based (PS) method for skyline cardinality estimation. PS is non-parametric. It does not assume any particular data distribution and is, thus, more robust than LS. PS does not require complex mathematical calculations. Therefore, it is much simpler to implement and much faster to yield the estimates than KB. Extensive empirical studies show that for a variety of real and synthetic data sets, PS outperforms LS in terms of estimation speed, estimation accuracy, and estimation variability under the same space budget. PS outperforms KB in terms of estimation speed and estimation variability under the same performance mark. 相似文献

10.

基于多重加权树的并行数据库查询优化方法 总被引：1，自引：0，他引：1

李建中《计算机学报》1998,21(5):401-412

本文提出了一种基于多重加权树的查询优化方法，包括多重加权树并行查询计划模型、并行查询计划的复杂性模型和查询优化处工法。相似文献

11.

基于属性相关性的SPARQL查询优化方法

吕彬杜小勇王琰《计算机研究与发展》2009,46(Z2)

目前主流的RDF存储系统都是基于关系数据库的,其查询引擎都是将SPARQL转换为SQL,然后由数据库的查询引擎来执行查询.但是,目前的数据库查询优化器对于连接查询的选择度估计都是基于属性独立假设的,这往往导致估计错误而选择了效率低的执行计划,所以属性相关性信息对于SPARQL查询优化器能否找到效率高的执行计划是非常重要的.针对SPARQL转换为SQL后,因连接操作没有优化导致查询效率不高的问题,提出了利用本体信息自动计算属性相关性的方法,从而调整连接操作的选择度估计值,调整连接顺序,提高SPARQL查询中基本图模式的连接查询效率. 相似文献

12.

Using views to generate efficient evaluation plans for queries

《Journal of Computer and System Sciences》2007,73(5):703-724

We study the problem of generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open-world assumption is used. In the closed-world model, there can be an infinite number of different rewritings that compute the same answer, yet have quite different performance. Query optimizers take a logical plan (a rewriting of the query) as an input, and generate efficient physical plans to compute the answer. Thus our goal is to generate a small subset of the possible logical plans without missing an optimal physical plan.We first consider a cost model that counts the number of subgoals in a physical plan, and show a search space that is guaranteed to include an optimal rewriting, if the query has a rewriting in terms of the views. We also develop an efficient algorithm for finding rewritings with the minimum number of subgoals. We then consider a cost model that counts the sizes of intermediate relations of a physical plan, without dropping any attributes, and give a search space for finding optimal rewritings. Our final cost model allows attributes to be dropped in intermediate relations. We show that, by careful variable renaming, it is possible to do better than the standard “supplementary relation” approach, by dropping attributes that the latter approach would retain. Experiments show that our algorithm of generating optimal rewritings has good efficiency and scalability. 相似文献

13.

Semantic query optimization for query plans of heterogeneousmultidatabase systems

Chun-Nan Hsu Knoblock C.A. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(6):959-978

New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization 相似文献

14.

An index selection method without repeated optimizer estimations

Kuo-Fong Kao I-En Liao 《Information Sciences》2009,179(13):2263-2272

The index selection problem (ISP) concerns the selection of an appropriate index set to minimize the total cost for a given workload containing read and update queries. Since the ISP has been proven to be an NP-hard problem, most studies focus on heuristic algorithms to obtain approximate solutions. However, even approximate algorithms still consume a large amount of computing time and disk space because these systems must record all query statements and frequently request from the database optimizers the cost estimation of each query in each considered index. This study proposes a novel algorithm without repeated optimizer estimations. When a query is delivered to a database system, the optimizer evaluates the costs of various query plans and chooses an access path for the query. The information from the evaluation stage is aggregated and recorded with limited space. The proposed algorithm can recommend indexes according to the readily available information without querying the optimizer again. The proposed algorithm was tested in a PostgreSQL database system using TPC-H data. Experimental results show the effectiveness of the proposed approach. 相似文献

15.

AlphaQO: Robust Learned Query Optimizer

下载免费PDF全文

Xiang Yu Chengliang Chai Xinning Zhang Nan Tang Ji Sun Guoliang Li 《International Journal of Software and Informatics》2022,12(1):7-29

Recently, learned query optimizers typically driven by deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art commercial optimizers. A successful learning optimizer often relies on enough high-quality load queries as training data, and poor-quality training will lead to the query failure of learned query optimizers. In this paper, we propose a novel training framework AlphaQO for robust learned query optimizers based on Reinforcement Learning (RL), and the robustness of the optimizers can be improved by finding the bad queries in advance. AlphaQO is a loop system consisting of two main components, namely the query generator and the learned optimizer. A query generator aims at generating ``difficult'' queries (i.e., queries that the learned optimizer provides poor estimates). The learned optimizer will be trained using these generated queries, as well as providing feedback (in terms of numerical rewards) to the query generator for updates. If the generated queries are good, the query generator will get a high reward; otherwise, the query generator will get a low reward. The above process is performed iteratively, with the main goal that within a small budget, the learned optimizer can be trained and generalized well to a wide range of unseen queries. Extensive experiments show that AlphaQO can generate a relatively small number of queries and train a learned optimizer to outperform commercial optimizers. Moreover, learned optimizers require much fewer queries from AlphaQO than randomly generated queries for the quality training of the learned optimizer. 相似文献

16.

并行查询交互度量及执行计划选择

下载免费PDF全文

柳浩楠牛保宁程永强《计算机工程与应用》2022,58(17):72-80

查询是数据库系统的主要负载,查询的执行效率直接影响着系统的性能。目前,由于查询交互（query interaction,QI）复杂多变,查询优化器不能准确地评估查询进入系统产生的影响,很难为并行查询选择较优执行计划。将查询的平均响应时间、平均执行时间、平均I/O时间和平均缓冲区命中率作为QI的特征参数,表示QI;提出多维度查询交互度量（multi-dimensional measurement of query interaction,MMQI）模型和执行计划选择（execution plan selection,EPS）模型,采用深度神经网络,在度量QI的基础上,把QI作为主要因素,为并行查询选择较优执行计划。考虑到查询执行计划是由一系列关系运算组成的,以及QI具有时域特性,MMQI采用双向长短期记忆神经网络（bidirectional long-short term memory,Bi-LSTM）度量QI,从查询执行计划提取特征作为输入,将QI特征参数的改变作为输出,预测查询采用不同执行计划进入系统后QI特征参数的改变;EPS把预测到的查询特征参数的改变作为查询交互特征（feature of query interaction,FQI）,与查询候选执行计划特征（features of candidate plan,FCP）融合,作为另一个Bi-LSTM的输入,为查询动态地选择较优执行计划。在PostgreSQL上的实验表明,MMQI-EPS比查询优化器选择较优执行计划的平均准确率提高38.6个百分点。相似文献

17.

AlphaQO:鲁棒的学习型查询优化器

余翔柴成亮张辛宁汤南孙佶李国良《软件学报》2022,33(3):814-831

由深度学习驱动的学习型查询优化器正在越来越广泛地受到研究者的关注,这些优化器往往能够取得近似甚至超过传统商业优化器的性能.与传统优化器不同的是,一个成功的学习型优化器往往依赖于足够多的高质量的负载查询作为训练数据.低质量的训练查询会导致学习型优化器在未来的查询上失效.提出了基于强化学习的鲁棒的学习型查询优化器训练框架A... 相似文献

18.

Progressive Parametric Query Optimization

Bizarro Pedro Bruno Nicolas DeWitt David J. 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(4):582-594

Commercial applications usually rely on pre-compiled parameterized procedures to interact with a database. Unfortunately, executing a procedure with a set of parameters different from those used at compilation time may be arbitrarily sub-optimal. Parametric query optimization (PQO) attempts to solve this problem by exhaustively determining the optimal plans at each point of the parameter space at compile time. However, PQO is likely not cost-effective if the query is executed infrequently or if it is executed with values only within a subset of the parameter space. In this paper we propose instead to progressively explore the parameter space and build a parametric plan during several executions of the same query. We introduce algorithms that, as parametric plans are populated, are able to frequently bypass the optimizer but still execute optimal or near-optimal plans. 相似文献

19.

优化处理并行数据库查询的并行数据流方法 总被引：1，自引：0，他引：1

李建中《软件学报》1998,9(3):174-180

本文使用并行数据流技术优化和处理并行数据库查询的方法,提出了一整套相关算法,并给出了一个基于并行数据流方法的并行数据库查询优化处理器的完整设计.这些算法和相应的查询优化处理器已经用于作者自行设计的并行数据库管理系统原型.实践证明,并行数据流方法不仅能够快速有效地实现并行数据库管理系统,也能够有效地进行并行数据库查询的优化处理. 相似文献

20.

Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

下载免费PDF全文

Shao-Jie Qiao Guo-Ping Yang Nan Han Hao Chen Fa-Liang Huang Kun Yue Yu-Gen Yi Chang-An Yuan 《计算机科学技术学报》2021,36(4):762-777

Although the popular database systems perform well on query optimization, they still face poor query execution plans when the join operations across multiple tables are complex. Bad execution planning usually results in bad cardinality estimations. The cardinality estimation models in traditional databases cannot provide high-quality estimation, because they are not capable of capturing the correlation between multiple tables in an effective fashion. Recently, the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods. Basically, they used deep neural networks to compute the relationships and correlations of tables. In this paper, we propose a vertical scanning convolutional neural network (abbreviated as VSCNN) to capture the relationships between words in the word vector in order to generate a feature map. The proposed learning-based cardinality estimator converts Structured Query Language (SQL) queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps, separately, and then merge them to obtain enough semantic information from data samples. In particular, the feature map obtained by VSCNN contains semantic information including tables, joins, and predicates about SQL queries. Importantly, in order to improve the accuracy of cardinality estimation, we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap. Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6％ when compared with the estimators in traditional databases. 相似文献