首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
提出了一种基于查询树匹配的查询重用算法.首先,系统中原有查询树与新生成的查询树进行匹配并计算对新查询树的重用收益;然后根据重用收益来实现重叠的查询操作的重用.实验结果表明,该算法能够有效地减少连续查询的执行代价总量.  相似文献   

2.
列存储数据查询优化的重点是列的连接策略.现有的列存储系统通过存储的改变来简化列的连接,致使列的连接缺少查询优化处理,策略单一且无法满足复杂查询.在剖析现有连接选择策略的基础上,提出一种新的连接策略优化方法,即首先利用基于规则的优化方法为列存储数据查询制定优化规则,过滤不可能产生最优计划的候选计划;然后设计了基于代价的优化算法,根据动态Huffman树和左深连接树原理对查询执行顺序进行改进,进一步减少候选计划的规模;根据列存储数据的特点将候选计划中每个连接节点的执行策略归纳为串行连接和并行连接两类,并在此基础上提出代价估计模型,进而可针对这两种连接策略进行代价估计和策略选择.最后在SSB数据集上通过实验证明了方法在列存储数据查询中的有效性.  相似文献   

3.
列的连接策略优化是列存储数据查询中的重要问题。现有的列存储系统中,列的连接存在策略单一,缺少优化处理,无法满足复杂查询等缺陷。针对这些问题,提出一种连接策略选择方法。该方法首先定义简单规则过滤代价过大的查询计划,生成候选查询计划树。进而根据动态Huffman树原理提出动态优化树算法,对候选查询计划树中的查询执行顺序进行改进。根据列存储数据的特点,候选计划中每个连接节点的执行策略被归纳为两种:串行连接和并行连接。在此基础上构建代价估计模型,集中针对这两种连接策略进行代价估计和策略选择,从而以较小的时间复杂度获得优化的查询执行策略。  相似文献   

4.
软件重用技术是当前软件开发研究的重点,针对电子政务领域产品开发过程中的重用策略与方法,从系统分析、设计到编码,讨论了软件的领域重用与层次重用等方面的问题.实现了产品领域横向的重用和产品开发过程中的纵向层次体系结构的重用,从而提高了软件产品的可重用性和软件生产率,并为后继产品的开发提供了良好的可重用基础.  相似文献   

5.
该文以丛生树模型为基础,提出了一种片段式查询执行计划。该执行计划将查询树划分成多个按流水线方式执行的片段,各片段依次执行。该执行计划可以减少中间结果的I/0次数,更充分地利用内存资源。文中还举例说明了计划的执行过程。  相似文献   

6.
在重用检测类数据库软件时,所遇到的重要问题之一是DataWindow 的列标题修改问题。因为目前的应用中,不能实现由普通用户自行完成对它的修改,因此成了此类数据库软件重用过程中的主要障碍之一。本文首先构建了特殊的底层数据库,然后通过数据库的初始化、更新及DataWindow 的列标题更新,最终实现了DataWindow 的列标题修改。应用本技术开发检测类数据库软件,解决了不同检测对象下数据库软件的重用性问题,减小了软件的开发时间,方便简单,易于操作。  相似文献   

7.
物化是列存储数据仓库查询中必不可少的操作,物化策略和物化技术直接影响到查询执行的性能,因此设计一种适应于列存储系统的物化策略和相关技术尤为重要.针对延迟物化可能重复读取数据块的缺陷,提出了基于带值路径的物化技术,简称VPM.首先,定义了一个描述物理执行中间结果的结构——传递块,该结构将用于重构的位置信息与实际列值相分离.在此基础上,对于给定的物理查询树,根据其操作节点是否需要某一列的值进行路径标记,生成自扫描节点或抽值节点到最终需要这些节点的引用列的祖先节点之间的路径,即带值路径.将起始节点引用列的列值保存在传递块的列值区中,并在向查询树的上层操作节点传输过程中不断对其过滤.对带值路径中的其他列仅保存其位置信息.在查询执行时,除了路径起始节点要从磁盘读取数据外,其他节点直接从传递块中获得相应的列值,有效地减少了查询处理过程的I/O开销,提高了查询的执行性能.最后在DWMS上使用TPC-H中针对数据仓库的基准数据集SSBM进行实验,验证了基于带值路径物化技术的有效性.  相似文献   

8.
康炎丽  李丰  王蕾 《软件学报》2017,28(7):2126-2147
大数据蕴含着巨大的价值.分析类查询是获取数据价值的一种重要手段.为及时把握分析结果的变化,查询需要周期性地重复.为此,将不可避免地引入对旧数据的重复分析.目前,以重用历史数据的中间结果,优化冗余计算为核心思路的增量分析技术,存在用户透明性不佳、对历史结果存储位置的选择不够智能化等问题,对周期性增量查询的优化效果有限.本文从兼顾用户透明性和优化收益的角度出发,设计了一种以语义规则为指导的增量优化方法.该方法扩展了增量描述语法,以查询操作符的操作语义和输出语义指导对历史数据存储、合并位置的选择,再根据代价模型和物理查询任务的划分位置对选择结果进行调整,生成优化后可以在分布式计算框架(如:MapReduce)周期性调度执行的物理查询任务.本文以Apache Hive为基础实现了上述方法的原型HiveInc.实验表明,对于扩展了增量语法描述的TPC-H测试集,HiveInc相比优化前,可以获得平均2.93倍,最高5.78倍的加速;与经典优化技术IncMR,DryadInc相比,分别可以获得1.69和1.61倍的加速.  相似文献   

9.
查询优化研究是数据库设计与应用中重点考虑问题,在研究MySql数据库查询优化技术基础上,对配置参数中的数据缓存区和日志缓存区的参数进行优化设置,提高查询效率。针对MySql查询重用技术的特点及存在不规范化sql语言等重用性不高问题,通过设计相关模块改进查询重用功能;增加查询缓冲区的查询执行计划模块来解决查询缓存区大数据量结果集存储问题,并通过实验进行验证,从而改进查询优化效率。  相似文献   

10.
软件产品的应用日益广泛,软件研制过程中出现的问题也不断增多。随着业内的不断探索,逐渐发现软件重用(含过程重用和产品重用)是解决"软件危机"的有效手段之一。软件过程资产库将软件研制过程中积累的可重用过程和可重用产品在信息化平台中进行统一管理,便于项目层进行查询和使用,促进软件研制效率和软件产品质量的提高;从组织层来看,通过不断优化可重用过程并积累可重用产品,可以实施持续的软件过程改进,逐步提高软件研制能力成熟度,不断增强软件组织的竞争力。为了提升软件研制能力,研究并建立了软件过程资产库,结果表明软件过程资产库在提高软件研制能力方面具有显著作用。  相似文献   

11.
Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.  相似文献   

12.
查询是数据库系统的主要负载,其效率决定了数据库性能的好坏。一个查询存在多种执行计划,当前,查询优化器只能按照数据库系统的配置参数,静态地为查询选择一个较优的执行计划。并行查询间存在复杂多变的资源争用,很难通过配置参数准确反映,而且同一执行计划在不同情景下的效率并不一致。并行查询下执行计划的选择需考虑查询间的相互影响——查询交互。基于此,提出了一种在并行查询下度量查询受查询交互影响大小的标准QIs。针对并行查询下查询执行计划的选择,还提出了一种动态地为查询选择执行计划的方法TRating,该方法通过比较查询组合中按不同执行计划执行的查询受查询交互影响的大小,选择受查询交互影响较小的执行计划作为该查询的较优执行计划。实验结果表明,TRating方法为查询选择较优执行计划的准确率达61%,相比查询优化器提高了25%;而且在为查询选择次优执行计划时,其准确率也高达69%。  相似文献   

13.
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees  相似文献   

14.
The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the nodes  相似文献   

15.
A multidatabase system (MDBS) integrates information from multiple autonomous local databases. Performing global query optimization to achieve efficient query processing in such a system is challenging due to local autonomy of the data sources. Dynamic factors in the environment make the problem even more difficult. In this paper, we present two techniques, i.e., contention space partitioning and cost error controlling, to perform global query optimization in a dynamic MDBS. Both techniques generate an execution plan with multiple versions for a query in a dynamic MDBS, utilizing the multistate cost models built for the dynamic environment via our previous multistate query sampling method. The first technique partitions the contention space of a dynamic multidatabase environment into a given number of subspaces and chooses a good query execution plan version for each subspace, while the second technique selects a set of execution plan versions by using a given error tolerance to control query execution costs. Experiments demonstrate that the proposed techniques are quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic runtime information. Research was supported by the US National Science Foundation under Grant # IIS-9811980 and The University of Michigan.  相似文献   

16.
The traditional approach to evaluate query execution strategies using approximate cost models may be inadequate for particular environments. For instance, if the environment does not satisfy the assumptions made by the cost model, the cost estimates can be so distorted that expensive strategies will be chosen. We propose a new approach for choosing execution strategies based on the actual cost history of query execution under various strategies, rather than on assumption-loaded estimates of these costs. Adaptive selection automatically changes the strategies selected, tracking cost variations caused by changes in the database state and query load. Furthermore, it does not require any assumptions about internal database structures, data characteristics, or distribution of queries. Queries are divided into query classes, where all queries in a class share the same execution strategies. A learning automaton is then used for each class to infer over time which are the current best strategies, based on actual query execution costs. We show the results of running the adaptive selector using real query loads for an existing database.  相似文献   

17.
Optimization of parallel execution for multi-join queries   总被引:5,自引:0,他引:5  
We study the subject of exploiting interoperator parallelism to optimize the execution of multi-join queries. Specifically, we focus on two major issues: (1) scheduling the execution sequence of multiple joins within a query, and (2) determining the number of processors to be allocated for the execution of each join operation obtained in (1). For the first issue, we propose and evaluate by simulation several methods to determine the general join sequences, or bushy trees. Despite their simplicity, the heuristics proposed can lead to the general join sequences that significantly outperform the optimal sequential join sequence. The quality of the join sequences obtained by the proposed heuristics is shown to be fairly close to that of the optimal one. For the second issue, it is shown that the processor allocation for exploiting interoperator parallelism is subject to more constraints-such as execution dependency and system fragmentation-than those in the study of intraoperator parallelism for a single join. The concept of synchronous execution time is proposed to alleviate these constraints. Several heuristics to deal with the processor allocation, categorized by bottom-up and top-down approaches, are derived and are evaluated by simulation. The relationship between issues (1) and (2) is explored. Among all the schemes evaluated, the two-step approach proposed, which first applies the join sequence heuristic to build a bushy tree as if under a single processor system, and then, in light of the concept of synchronous execution time, allocates processors to execute each join in the bushy tree in a top-down manner, emerges as the best solution to minimize the query execution time  相似文献   

18.
针对当前微机图象处理领域中存在的一些问题,本文提出了一种面向对象的图象处理方法,这对于解决微机图象处理中存在的问题具有一定的参考价值。  相似文献   

19.
Global query execution in a multidatabase system can be done parallelly, as all the local databases are independent. In this paper, a cost model that considers parallel execution of subqueries for a global query is developed. In order to obtain maximum parallelism in query execution, it is required to find a query execution plan that is represented in the form of a bushy tree and this query tree should be balanced to the maximal possible extent with respect to execution time. A new bottom up approach called Agglomerative Approach (AA) is proposed to construct balanced bushy trees with respect to execution time. By the deterministic nature of this approach, it generates local optimal solutions. This local minima problem will be severe in the case of graph queries, i.e., queries that are represented with a graph structure. A Simulated annealing Approach (SA) is employed to obtain a (near) optimal solution. These approaches (AA and SA) are suitable for handling on-line and off-line queries respectively. A Hybrid Approach (HA), that is an integration of AA and SA, is proposed to optimize queries for which the estimated time to be spent on optimization is known a priori. Results obtained with AA and SA on both tree and graph structured queries are presented.  相似文献   

20.
目前主流的RDF存储系统都是基于关系数据库的,其查询引擎都是将SPARQL转换为SQL,然后由数据库的查询引擎来执行查询.但是,目前的数据库查询优化器对于连接查询的选择度估计都是基于属性独立假设的,这往往导致估计错误而选择了效率低的执行计划,所以属性相关性信息对于SPARQL查询优化器能否找到效率高的执行计划是非常重要的.针对SPARQL转换为SQL后,因连接操作没有优化导致查询效率不高的问题,提出了利用本体信息自动计算属性相关性的方法,从而调整连接操作的选择度估计值,调整连接顺序,提高SPARQL查询中基本图模式的连接查询效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号