首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Shared-nothing并行数据库系统查询优化技术   总被引:15,自引:0,他引:15  
查询优化是并行数据库系统的核心技术。该文介绍作者自行研制的一个Shared-nothing并行数据库系统PBASE/2中独特的两阶段优化策略。为了缩减并行相称优化庞大的搜索空间,PBASE/2将并行查询优化划分为顺序优化和并行化两个在阶段。在顺序优化阶段对并行化后的通信代价进行预先估算,将通信开销加入顺序优化的代价模型,同时对动态规划搜索算法进行了修正和扩展,保证了顺序优化阶段得到的最小代价计划在  相似文献   

2.
A consensus on parallel architecture for very large database management has emerged. This architecture is based on a shared-nothing hardware organization. The computation model is very sensitive to skew in tuple distribution, however. Recently, several parallel join algorithms with dynamic load balancing capabilities have been proposed to address this issue, but none of them consider multi-way join problems. In this article we propose a dynamic load balancing technique for multi-way joins, and investigate the effect of load balancing on query optimization. In particular, we present a join-ordering strategy that takes load-balancing issues into consideration. Our performance study indicates that the proposed query optimization technique can provide very impressive performance improvement over conventional approaches.An earlier version of this article was presented at the 1993 International Conference on Parallel and Distributed Information Systems in San Diego, California, U.S.A.  相似文献   

3.
空间连接查询是最耗时,最重要的空间查询、空间多路连接是涉及多个空间关系的连接查询,顺序空间连接查询的效率还是不能令人满意,研究利用并行机制提高空间连接查询效率成为有吸引力的方向,并行空间连接处理由三个阶段组成;任务创建,任务分配和任务并行执行,本文提出一种新的平面扫描方法用于多路并行处理的任务创建过程,随机提出基于花费估计的动态任务分配策略,给出了花费模型,并将其推到处理多路并行连接查询处理以实现负荷平衡。  相似文献   

4.
New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization  相似文献   

5.
查询是数据库系统的主要负载,为查询选择合适的执行计划是提高数据库系统性能、最终提升应用系统性能的关键.针对当前查询优化器为并发查询选择的执行计划准确率较低、动态性不足的问题,利用长短期记忆(long short-term memory,LSTM)网络的时域特性和全连接层网络(full connected network...  相似文献   

6.
Scalability is one of the most important quality attribute of software-intensive systems, because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload. In order to achieve scalability, thread pool system (TPS) (which is also known as executor service) has been used extensively as a middleware service in software-intensive systems. TPS optimization is a challenging problem that determines the optimal size of thread pool dynamically on runtime. In case of distributed-TPS (DTPS), another issue is the load balancing b/w available set of TPSs running at backend servers. Existing DTPSs are overloaded either due to an inappropriate TPS optimization strategy at backend servers or improper load balancing scheme that cannot quickly recover an overload. Consequently, the performance of software-intensive system is suffered. Thus, in this paper, we propose a new DTPS that follows the collaborative round robin load balancing that has the effect of a double-edge sword. On the one hand, it effectively performs the load balancing (in case of overload situation) among available TPSs by a fast overload recovery procedure that decelerates the load on the overloaded TPSs up to their capacities and shifts the remaining load towards other gracefully running TPSs. And on the other hand, its robust load deceleration technique which is applied to an overloaded TPS sets an appropriate upper bound of thread pool size, because the pool size in each TPS is kept equal to the request rate on it, hence dynamically optimizes TPS. We evaluated the results of the proposed system against state of the art DTPSs by a client-server based simulator and found that our system outperformed by sustaining smaller response times.  相似文献   

7.
In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time.  相似文献   

8.
Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.  相似文献   

9.
Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use of partial distribution information raises the possibility that multiple, non-equivalent selectivity estimates may be available for a given predicate. Current optimizers use cumbersome ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the optimizer toward query plans for which the least information is available, often yielding poor results. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Experiments with our prototype implementation in DB2 UDB show that use of the ME approach can improve the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times. For almost all queries, these improvements are obtained while adding only tens of milliseconds to the overall time required for query optimization.  相似文献   

10.
Compilers and optimizers for declarative query languages use some form of intermediate language to represent user-level queries. The advent of compositional query languages for orthogonal type systems (e.g., OQL) calls for internal query representations beyond extensions of relational algebra. This work adopts a view of query processing which is greatly influenced by ideas from the functional programming domain. A uniform formal framework is presented which covers all query translation phases, including user-level query language compilation, query optimization, and execution plan generation. We pursue the type-based design—based on initial algebras—of a core functional language which is then developed into an intermediate representation that fits the needs of advanced query processing. Based on the principle of structural recursion we extend the language by monad comprehensions (which provide us with a calculus-style sublanguage that proves to be useful during the optimization of nested queries) and combinators (abstractions of the query operators implemented by the underlying target query engine). Due to its functional nature, the language is susceptible to program transformation techniques that were developed by the functional programming as well as the functional data model communities. We show how database query processing can substantially benefit from these techniques.  相似文献   

11.
FuzzyCLIPS is a rule-based programming language and it is very suitable for developing fuzzy expert systems. However, it usually requires much longer execution time than algorithmic languages such as C and Java. To address this problem, we propose a parallel version of FuzzyCLIPS to parallelize the execution of a fuzzy expert system with data dependence on a cluster system. We have designed some extended parallel syntax following the original FuzzyCLIPS style. To simplify the programming model of parallel FuzzyCLIPS, we hide, as much as possible, the tasks of parallel processing from programmers and implement them in the inference engine by using MPI, the de facto standard for parallel programming for cluster systems. Furthermore, a load balancing function has been implemented in the inference engine to adapt to the heterogeneity of computing nodes. It will intelligently allocate different amounts of workload to different computing nodes according to the results of dynamic performance monitoring. The programmer only needs to invoke the function in the program for better load balancing. To verify our design and evaluate the performance, we have implemented a human resource website. Experimental results show that the proposed parallel FuzzyCLIPS can garner a superlinear speedup and provide a more reasonable response time.  相似文献   

12.
Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature.  相似文献   

13.
Agent-based distributed simulations are confronted with load imbalance problem, which significantly affects simulation performance. Dynamic load balancing can be effective in decreasing simulation execution time and improving simulation performance. The characteristics of multi-agent systems and time synchronization mechanisms make the traditional dynamic load balancing approaches not suitable for dynamic load balancing in agent-based distributed simulations. In this paper, an adaptive dynamic load balancing model in agent-based distributed simulations is proposed. Due to the complexity and huge time consuming for solving the model, a distributed approximate optimized scheduling algorithm with partial information (DAOSAPI) is proposed. It integrates the distributed mode, approximate optimization and agent set scheduling approach. Finally, experiments are conducted to verify the efficiency of the proposed algorithm and the simulation performance under dynamic agent scheduling. The experiments indicate that DAOSPI has the advantage of short execution time in large-scale agent scheduling, and the distributed simulation performance under this dynamic agent scheduling outperforms that under static random agent distribution.  相似文献   

14.
目前主流的RDF存储系统都是基于关系数据库的,其查询引擎都是将SPARQL转换为SQL,然后由数据库的查询引擎来执行查询.但是,目前的数据库查询优化器对于连接查询的选择度估计都是基于属性独立假设的,这往往导致估计错误而选择了效率低的执行计划,所以属性相关性信息对于SPARQL查询优化器能否找到效率高的执行计划是非常重要的.针对SPARQL转换为SQL后,因连接操作没有优化导致查询效率不高的问题,提出了利用本体信息自动计算属性相关性的方法,从而调整连接操作的选择度估计值,调整连接顺序,提高SPARQL查询中基本图模式的连接查询效率.  相似文献   

15.
A Grid is a network of computational resources that may potentially span many continents. Load balancing in a Grid is a hot research issue which affects every aspect of the Grid, including service selection and task execution. Thus, it is necessary and significant to solve the load balancing problem in a Grid. In this paper, we propose a dynamic, distributed load balancing scheme for a Grid which provides deadline control for tasks. In our scenario, first, resources check their state and make a request to the Grid Broker according to the change of load state. Then, the Grid Broker assigns Gridlets between resources and scheduling for load balancing under the deadline request. We apply our load balancing strategy into a popular Grid simulation platform GridSim. Experimental results prove that our proposed load balancing mechanism can (1) reduce the makespan, (2) improve the finished rate of the Gridlet, and (3) reduce the resubmitted time.  相似文献   

16.
The paper is devoted to the problem of effective query execution in cluster-based systems. An original approach to data placement and replication on the nodes of a cluster system is presented. Based on this approach, a load balancing method for parallel query processing is developed. A method for parallel query execution in cluster systems based on the load balancing method is suggested. Results of computational experiments are presented, and analysis of efficiency of the proposed approaches is performed.  相似文献   

17.
Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution.  相似文献   

18.
In this paper we describe the design and implementation of OPT++, a tool for extensible database query optimization that uses an object-oriented design to simplify the task of implementing, extending, and modifying an optimizer. Building an optimizer using OPT++ makes it easy to extend the query algebra (to add new query algebra operators and physical implementation algorithms to the system), easy to change the search space, and also to change the search strategy. Furthermore, OPT++ comes equipped with a number of search strategies that are available for use by an optimizer-implementor. OPT++ considerably simplifies both, the task of implementing an optimizer for a new database system, and the task of evaluating alternative optimization techniques and strategies to decide what techniques are best suited for that database system. We present the results of a series of performance studies. These results validate our design and show that, in spite of its flexibility, OPT++ can be used to build efficient optimizers. Received October 1996 / Accepted January 1998  相似文献   

19.
针对民机增升构型失速特性的数值模拟,我们基于贪婪负载平衡算法的剖分工具对多块结构网格进行区域分割,在某新型超级计算机系统上完成求解软件的移植、优化和测试,采用 2 亿量级的计算网格开展大规模并行计算研究,测试完成了万核级负载平衡的网格区域分割,实现了增升构型失速特性的 4 096 核数并行计算,并行效率达到 50% 以上,提高了工程应用中对复杂流动现象的数值模拟能力。数值模拟结果加深了对增升构型失速流动机理的理解,可以为增升装置设计优化提供有意义的参考依据。  相似文献   

20.
Volcano-an extensible and parallel query evaluation system   总被引:2,自引:0,他引:2  
To investigate the interactions of extensibility and parallelism in database query processing, we have developed a new dataflow query execution system called Volcano. The Volcano effort provides a rich environment for research and education in database systems design, heuristics for query optimization, parallel query execution, and resource allocation. Volcano uses a standard interface between algebra operators, allowing easy addition of new operators and operator implementations. Operations on individual items, e.g., predicates, are imported into the query processing operators using support functions. The semantics of support functions is not prescribed; any data type including complex objects and any operation can be realized. Thus, Volcano is extensible with new operators, algorithms, data types, and type-specific methods. Volcano includes two novel meta-operators. The choose-plan meta-operator supports dynamic query evaluation plans that allow delaying selected optimization decisions until run-time, e.g., for embedded queries with free variables. The exchange meta-operator supports intra-operator parallelism on partitioned datasets and both vertical and horizontal inter-operator parallelism, translating between demand-driven dataflow within processes and data-driven dataflow between processes. All operators, with the exception of the exchange operator, have been designed and implemented in a single-process environment, and parallelized using the exchange operator. Even operators not yet designed can be parallelized using this new operator if they use and provide the interator interface. Thus, the issues of data manipulation and parallelism have become orthogonal, making Volcano the first implemented query execution engine that effectively combines extensibility and parallelism  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号