首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 125 毫秒
多连接查询优化是提高数据库性能的关键问题之一。Chiang Lee提出了一种启发式多连接查询优化算法MVP,分析发现该算法并没有考虑减小执行计划的计算代价。该文结合哈希过滤的特点提出一种改进的多连接查询优化算法,与MVP算法相比该算法降低了执行计划的计算代从,从而使查询响应时间更短。  相似文献   

分析和研究了传统的分布式数据库连接查询优化算法,利用数据划分和并行处理执行策略,提出了多连接属性划分的查询优化算法.实验证明,此算法可以提高查询的响应速度,减少查询的响应时间,在处理分布式数据 '库中海量信息查询和复杂查询方面具有实用价值.  相似文献   

针对现有查询响应时间预测统计模型存在准确率无法提高、特征选取单一、动态性差的问题,综合考虑查询计划、查询交互两大因素,提出采用结构简单、易搭建的人工神经网络——全连接神经网络预测并行查询响应时间.采集查询计划与查询交互数据作为输入特征,查询真实的响应时间作为预测标签,训练模型,进行预测.此方法不需要预先知道样本数据的数学模型函数,仅通过对样本数据集的学习建立模型,建模过程简单,可达较好的预测效果.实验结果表明,全连接神经网络模型准确率高达79.99%,较当前代表性的统计模型提高约6%.  相似文献   

宋杰  李甜甜  朱志良  鲍玉斌  于戈 《软件学报》2015,26(6):1438-1456
数据的指数级增长给数据管理和分析带来了严峻的挑战.连接查询是数据分析中一种常用运算,而MapReduce是一种用于大规模数据集并行处理的编程模型,研究基于MapReduce的连接查询代价评估和查询优化,有着学术意义和应用价值.MapReduce连接查询算法的性能主要取决于I/O代价(包括本地和网络I/O),而I/O代价与数据集以及连接运算的特征参数相关,通过对二元连接的I/O代价评估可以优化多元连接执行计划.基于此,首先提出了二元连接查询的I/O代价模型;随后,对现有二元连接算法进行形式化定义和简单扩展,归纳出6种基于MapReduce连接查询算法,并通过算法白盒分析定义它们的I/O代价函数;最后,提出一种多元连接最优执行计划的选择算法.通过实验表明I/O代价模型的正确性且能够准确地反映算法的性能优劣.  相似文献   

采用连接操作实现分布式数据查询时,不仅代价较高而且费时。若利用半连接查询操作取代连接操作,可以通过缩减操作数获得查询优化。而两次半连接对接算法可以更大限度地缩减操作数,实现起来更为简单、高效。针对多节点的数据查询,提出了通过最小生成树算法生成并行的连接序对,并由两次半连接对接算法进行查询优化处理的算法。应用该算法可以有效利用多节点的并行性缩短系统的查询响应时间,降低系统的总开销。算法在海量信息查询中具有实用价值。  相似文献   

语义缓存可以利用查询之间的语义相关性,是提高数据库查询性能的有效技术之一.传统语义缓存是按谓词来组织的,查询裁剪是串行进行的,算法的时间复杂性是指数级的.基于合取语义缓存模型,提出了并行查询裁剪算法.与现有的语义缓存查询裁剪算法相比,并行查询裁剪算法不但可以将算法的复杂性由指数级降为多项式,而且可以提高缓存的利用率,缩短查询的平均响应时间.  相似文献   

数据库查询优化技术的历史、现状与未来   总被引:1,自引:0,他引:1       下载免费PDF全文
传统的查询树优化方法,即基于左线性树、右线性树、浓密树、操作森林的并行数据库查询优化方法,各有优劣,对其的研究比较深入、成熟;基于多重加权树的查询优化方法,研究了其并行查询计划模型、并行查询计划的复杂性模型和查询优化算法;语义查询优化方法将一个查询变换成一个或数个语义等价的查询,进而寻找并执行这些等价查询中具有较好实现策略的一个;基于Agent的并行数据库查询优化采用Multi-Agent技术自动查找与给定查询有关的完整性约束条件,使得多个关系间连接操作的效率得到很大的提高;基于遗传算法的并行优化算法,深入研究了基于机群并行数据库中关系存储的选择、多连接查询优化和查询处理等关键技术。  相似文献   

并行查询处理,特别是并行连接查询处理技术是并行数据库中的关键技术.然而,目前的并行查询处理方法尚存在着一些局限性,如绝大多数的并行Join算法依赖于Hash方法对数据进行分治,因此只能支持等值Join等查询类型.为了解决这一问题,提出了一种基于伪半连接的通用θ-Join查询处理算法,并给出了基于查询语法树及并行执行计划的并行数据库通用查询处理方法.在此基础上,实现了一个并行分布式数据库原型系统PD-DBMS,实验结果表明,此方法提供了良好的并行查询处理性能.  相似文献   

简单阐述了分布式数据库中查询优化的查询目的,并简单介绍了直接连接优化算法中的Hash划分和Partition算法.通过分析,指出Partition算法的不足,并加以改进.在改进算法中提出了查询图划分方法,缩短查询操作的响应时间.  相似文献   

刘义  景宁  陈荦  熊伟 《软件学报》2013,24(8):1836-1851
针对大规模空间数据的高性能k-近邻连接查询处理,研究了MapReduce框架下基于R-树索引的k-近邻连接查询处理。首先利用无依赖并行和串行同步计算的形式化定义抽象了MapReduce并行编程模型,基于此并行计算模型抽象,分别提出了 R-树索引快速构建算法和基于 R-树的并行 k-近邻连接算法。在索引构建过程中,提出一种采样算法以快速确立空间划分函数,使得索引构建符合无依赖并行和串行同步计算抽象,在MapReduce框架下非常容易进行表达。在k-近邻连接查询过程中,基于构建的分布式R-树索引,引入k-近邻扩展框限定查询范围并进行数据划分,然后利用 R-树索引进行 k-近邻连接查询,提高了查询效率。从理论上分析了所提出算法的通信和计算代价。实验与分析结果表明,该算法在真实数据集的查询上具有良好的效率和可扩展性能,可以很好地支持大规模空间数据的k-近邻连接查询处理,具有良好的实用价值。  相似文献   

基于多重加权树的并行数据库查询优化方法   总被引:1,自引:0,他引:1  
李建中 《计算机学报》1998,21(5):401-412
本文提出了一种基于多重加权树的查询优化方法,包括多重加权树并行查询计划模型、并行查询计划的复杂性模型和查询优化处工法。  相似文献   

Dataflow query execution in a parallel main-memory environment   总被引:2,自引:0,他引:2  
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries.Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule; larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.  相似文献   

Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.  相似文献   

A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

基于遗传算法的多连接表达式并行查询优化   总被引:6,自引:0,他引:6  
曹阳  方强  王国仁  于戈 《软件学报》2002,13(2):250-257
多连接表达式的并行查询优化是提高数据库性能的关键问题之一.提出了使用遗传算法来解决多连接表达式的并行查询优化问题.为了提高查询处理器的执行效率,采用启发式规则来搜索最优的多连接表达式并行调度执行计划.文中给出了详细的测试结果和性能分析.实验结果表明,结合启发式知识的遗传算法是解决多连并行查询优化的有效途径,对提高数据库的性能起到重要作用.  相似文献   

A query processing strategy which is based on pipelining and data-flow techniques is presented. Timing equations are developed for calculating the performance of four join algorithms: nested block, hash, sort-merge, and pipelined sort-merge. They are used to execute the join operation in a query in distributed fashion and in pipelined fashion. Based on these equations and similar sets of equations developed for other relational algebraic operations, the performance of query execution was evaluated using the different join algorithms. The effects of varying the values of processing time, I/O time, communication time, buffer size, and join selectively on the performance of the pipelined join algorithms are investigated. The results are compared to the results obtained by employing the same algorithms for executing queries using the distributed processing approach which does not exploit the vertical concurrency of the pipelining approach. These results establish the benefits of pipelining  相似文献   

近优可扩展性:一种实用的可扩展性度量   总被引:2,自引:0,他引:2  
陈军  李晓梅 《计算机学报》2001,24(2):179-182
良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。  相似文献   

Management of large quantities of complex data is essential in many advanced application areas. Object-oriented (OO) database management system have been developed to effectively model and process the complex domain knowledge. They have been shown to outperform some existing relational systems. The existing implementations of OO database management systems attempt to improve the efficiency of OO queries by explicitly capturing the relationships among objects. However, the execution of complex queries involving the retrieval of objects from many classes and relationships among them causes the existing system to operate inefficiently. In this paper, we present parallel algorithms for the processing of queries against a large OO database. The algorithms are based on a closed model of query processing pattern-based access instead of the conventional value-based access. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the explicitly stored object associations. Generation of large quantities of temporary data is avoided by marking objects using their identifiers and by employing a two-phase query processing strategy. A query is processed by concurrent multiple waves, thereby improving parallelism avoiding the complexities introduced in their sequential implementation. The correctness and the performance of the parallel algorithms have been tested and analyzed by running parallel programs on a 32-node transputer based parallel machine designed and developed at the IBM Research Center at Yorktown Heights, New York. Benchmark queries of different semantic complexities are generated, and their performance is analyzed for various data and query parameters  相似文献   

Complex object-oriented queries generally consist of path expressions and explicit join operations. Since explicit join operations have been acknowledged as the most expensive operations, query executions normally start from the path expressions. Each path expression may form a sub-query. There are two existing strategies to sub-queries processing: ‘serial’ and ‘parallel’ execution scheduling strategies. Serial sub-queries execution corresponds to an execution of the sub-queries one-by-one, whereas parallel sub-queries execution corresponds to simultaneous execution of the sub-queries. When a sub-query is being processed, parallelization techniques may be applied. In this paper, we focus on the scheduling issues of the sub-queries, rather than the parallelization of the sub-queries themselves. Rules are formulated to guide the parallel query execution process. Our analysis shows that when there is no load skew, the serial scheduling strategy is preferred, otherwise the parallel scheduling strategy should be used.  相似文献   

列的连接策略优化是列存储数据查询中的重要问题。现有的列存储系统中,列的连接存在策略单一,缺少优化处理,无法满足复杂查询等缺陷。针对这些问题,提出一种连接策略选择方法。该方法首先定义简单规则过滤代价过大的查询计划,生成候选查询计划树。进而根据动态Huffman树原理提出动态优化树算法,对候选查询计划树中的查询执行顺序进行改进。根据列存储数据的特点,候选计划中每个连接节点的执行策略被归纳为两种:串行连接和并行连接。在此基础上构建代价估计模型,集中针对这两种连接策略进行代价估计和策略选择,从而以较小的时间复杂度获得优化的查询执行策略。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号