首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 218 毫秒
查询是数据库系统的主要负载,其效率决定了数据库性能的好坏。一个查询存在多种执行计划,当前,查询优化器只能按照数据库系统的配置参数,静态地为查询选择一个较优的执行计划。并行查询间存在复杂多变的资源争用,很难通过配置参数准确反映,而且同一执行计划在不同情景下的效率并不一致。并行查询下执行计划的选择需考虑查询间的相互影响——查询交互。基于此,提出了一种在并行查询下度量查询受查询交互影响大小的标准QIs。针对并行查询下查询执行计划的选择,还提出了一种动态地为查询选择执行计划的方法TRating,该方法通过比较查询组合中按不同执行计划执行的查询受查询交互影响的大小,选择受查询交互影响较小的执行计划作为该查询的较优执行计划。实验结果表明,TRating方法为查询选择较优执行计划的准确率达61%,相比查询优化器提高了25%;而且在为查询选择次优执行计划时,其准确率也高达69%。  相似文献   

宋杰  李甜甜  朱志良  鲍玉斌  于戈 《软件学报》2015,26(6):1438-1456
数据的指数级增长给数据管理和分析带来了严峻的挑战.连接查询是数据分析中一种常用运算,而MapReduce是一种用于大规模数据集并行处理的编程模型,研究基于MapReduce的连接查询代价评估和查询优化,有着学术意义和应用价值.MapReduce连接查询算法的性能主要取决于I/O代价(包括本地和网络I/O),而I/O代价与数据集以及连接运算的特征参数相关,通过对二元连接的I/O代价评估可以优化多元连接执行计划.基于此,首先提出了二元连接查询的I/O代价模型;随后,对现有二元连接算法进行形式化定义和简单扩展,归纳出6种基于MapReduce连接查询算法,并通过算法白盒分析定义它们的I/O代价函数;最后,提出一种多元连接最优执行计划的选择算法.通过实验表明I/O代价模型的正确性且能够准确地反映算法的性能优劣.  相似文献   

危剑豪  夏烨峰  宫学庆 《软件学报》2021,32(10):3176-3202
传统的数据库系统围绕单次查询的模型构建,独立地执行并发查询.由于该模型的限制,传统数据库无法一次对多个查询进行优化.多查询共享技术旨在共享查询之间的公共部分,从而达到提高系统整体响应时间和吞吐量的目的.将多查询执行模式分为两类,介绍了各自的原型系统——基于全局查询计划的多查询原型系统和以运算符为中心的多查询原型系统,并且讨论了两种系统的优势以及所适用场景.在之后的内容中,将多查询共享技术按照查询的各个阶段分为查询编译阶段中的多查询共享技术以及查询执行阶段中的多查询共享技术两大类.以这两个方向为线索,梳理了多查询计划的表示方法、多查询表达式合并、多查询共享算法、多查询优化等各种方向的研究成果.在此基础上,还介绍了共享查询技术在关系数据库和非关系数据库中的应用.最后,分析了共享查询技术面临的机遇和挑战.  相似文献   

一种新的关系数据库查询优化方法   总被引:1,自引:0,他引:1  
现代关系数据库查询优化器通常根据查询代价评估不同查询计划的执行效率,对查询计划中产生的中间结果集的错误预测是造成优化器效率低下的主要原因。为了解决这个问题,本文介绍一种新的SPS(Statistics Predict Set)查询优化方法。该方法能够有效地解决这方面的问题。  相似文献   

MapReduce分布式计算框架有助于提升大规模数据连接查询的效率,但当连接属性分布不均匀时,其简单的散列策略容易导致计算节点间负载不均衡,影响作业的整体性能。针对连接查询操作中的数据倾斜问题,研究了MapReduce框架下大规模数据连接查询操作的优化算法。首先对经典的改进重分区连接查询算法进行实验分析,研究了传统MapReduce计算框架下连接查询操作的执行流程,找出了基于MapReduce计算框架的连接查询算法在数据分布不均匀时的性能瓶颈;进而提出了组合分割平衡分区优化策略,设计并实现了基于组合分割平衡分区优化策略的改进型连接查询算法。实验结果表明,提出的优化策略在大规模数据的连接查询处理上很好地解决了数据倾斜带来的性能影响,具有好的时间性能和可扩展性。  相似文献   

IT运维终端用户数据查询时存在查询执行时间过长的问题,提出基于MapReduce的IT运维终端用户数据查询方法。设置终端用户数据查询关键词,获取终端用户数据特征;基于MapReduce设计运维数据查询算法;构建终端用户数据索引查询框架,从而完成IT运维终端用户数据查询。实验结果表明,设计的IT运维终端用户数据MapReduce查询方法的查询执行时间较短,查询效率较高,具有省时性,有一定的应用价值,为后续运维终端用户数据处理作出一定的贡献。  相似文献   

基于粒子群算法的数据库查询优化   总被引:1,自引:0,他引:1  
研究粒子群算法在数据库查询优化中的应用问题。为了解决大型数据库信息检索困难、查询效率低的问题,提出了一种基于粒子群算法优化数据库查询技术方案。算法提出了一种数据库查询执行计划代价模型,主要包括了查询多链接次序以及副本的选择问题,准确定义了数据库查询执行代价,采用提出的粒子群算法来优化并求解该执行代价问题,从而使得分组数目更少、数据定位更精确。实例验证结果表明,通过属性表现和违规行为任何教师都可以被准确定位,减少了分组,为数据库查询提供了优化。  相似文献   

1 引言在并行数据库的研究中,查询执行计划的调度与执行因其复杂性而受到人们的关注。查询优化时,优化器必须采用有效策略大幅度裁剪搜索空间,以降低优化开销,但这很可能会丧失掉更优的执行计划。另一方面,当系统吞吐量很高时,一个查询从优化到执行可能有一个较大的时间差,在查询计划执行时一些重要系统参数可能已经发生了较大变化,从而使该执行计划变得不优甚至难于执行。目前解决这一问题的一种方法是将查询优化与查询执行分开,在查询执行阶段通过有效的调度策略来弥补查询执行计划的缺陷,并进一步平衡系统的负载。  相似文献   

基于LazyDFA的XPath在XML数据流上查询优化算法   总被引:2,自引:0,他引:2  
针对XML数据流上XPath查询处理及查询优化问题,给出了一种基于lazyDFA技术的解决方案,并提出了优化算法。共享NFA状态表,通过将NFA中的状态分成共享和独享两个状态集来降低lazyDFA的内存使用量;建立状态转移表优化算法通过在lazyDFA状态结构中增加一个状态转移表,来提高lazyDFA的查询速度。实验结果表明,提出的方法能够在执行效率和空间代价方面优于传统算法。  相似文献   

Global query execution in a multidatabase system can be done parallelly, as all the local databases are independent. In this paper, a cost model that considers parallel execution of subqueries for a global query is developed. In order to obtain maximum parallelism in query execution, it is required to find a query execution plan that is represented in the form of a bushy tree and this query tree should be balanced to the maximal possible extent with respect to execution time. A new bottom up approach called Agglomerative Approach (AA) is proposed to construct balanced bushy trees with respect to execution time. By the deterministic nature of this approach, it generates local optimal solutions. This local minima problem will be severe in the case of graph queries, i.e., queries that are represented with a graph structure. A Simulated annealing Approach (SA) is employed to obtain a (near) optimal solution. These approaches (AA and SA) are suitable for handling on-line and off-line queries respectively. A Hybrid Approach (HA), that is an integration of AA and SA, is proposed to optimize queries for which the estimated time to be spent on optimization is known a priori. Results obtained with AA and SA on both tree and graph structured queries are presented.  相似文献   

Complex object-oriented queries generally consist of path expressions and explicit join operations. Since explicit join operations have been acknowledged as the most expensive operations, query executions normally start from the path expressions. Each path expression may form a sub-query. There are two existing strategies to sub-queries processing: ‘serial’ and ‘parallel’ execution scheduling strategies. Serial sub-queries execution corresponds to an execution of the sub-queries one-by-one, whereas parallel sub-queries execution corresponds to simultaneous execution of the sub-queries. When a sub-query is being processed, parallelization techniques may be applied. In this paper, we focus on the scheduling issues of the sub-queries, rather than the parallelization of the sub-queries themselves. Rules are formulated to guide the parallel query execution process. Our analysis shows that when there is no load skew, the serial scheduling strategy is preferred, otherwise the parallel scheduling strategy should be used.  相似文献   

This study proposes a method of in-network aggregate query processing to reduce the number of messages incurred in a wireless sensor network. When aggregate queries are issued to the resource-constrained wireless sensor network, it is important to efficiently perform these queries. Given a set of multiple aggregate queries, the proposed approach shares intermediate results among queries to reduce the number of messages. When the sink receives multiple queries, it should be propagated these queries to a wireless sensor network via existing routing protocols. The sink could obtain the corresponding topology of queries and views each query as a query tree. With a set of query trees collected at the sink, it is necessary to determine a set of backbones that share intermediate results with other query trees (called non-backbones). First, it is necessary to formulate the objective cost function for backbones and non-backbones. Using this objective cost function, it is possible to derive a reduction graph that reveals possible cases of sharing intermediate results among query trees. Using the reduction graph, this study first proposes a heuristic algorithm BM (standing for Backbone Mapping). This study also develops algorithm OOB (standing for Obtaining Optimal Backbones) that exploits a branch-and-bound strategy to obtain the optimal solution efficiently. This study tests the performance of these algorithms on both synthesis and real datasets. Experimental results show that by sharing the intermediate results, the BM and OOB algorithms significantly reduce the total number of messages incurred by multiple aggregate queries, thereby extending the lifetime of sensor networks.  相似文献   

在常规海量数据分析作业中,CPU/IO密集型的查询语句通常复杂、耗时并存在大量可复用的公共部分。如何检测、共享和复用回归查询集中语句间的公共部分成为亟需解决的问题。为此,提出特征值索引方法,并构建适用于云计算场景的LSShare多重查询优化系统。基于查询语句的抽象语法树将语句划分为不同的查询层次,针对每个查询层次抽取特征向量并计算特征值。建立简单高效的特征值索引表以识别多重查询语句间的公共部分,并结合SQL重写技术来复用其中的公共部分。随着运行迭代次数的增加,LSShare系统将逐步优化云计算场景中的回归查询集。实验结果表明,该系统在运行效率上优于传统查询语句系统,可节约近1/3的执行时间。  相似文献   

Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.  相似文献   

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063.  相似文献   

In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method.  相似文献   

Flesca  Sergio  Furfaro  Filippo  Greco  Sergio 《World Wide Web》2002,5(2):125-157
In this paper we present a graphical query language for XML. The language, based on a simple form of graph grammars, permits us to extract data and reorganize information in a new structure. As with most of the current query languages for XML, queries consist of two parts: one extracting a subgraph and one constructing the output graph. The semantics of queries is given in terms of graph grammars. The use of graph grammars makes it possible to define, in a simple way, the structural properties of both the subgraph that has to be extracted and the graph that has to be constructed. We provide an example-driven comparison of our language w.r.t. other XML query languages, and show the effectiveness and simplicity of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号