首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
A query processing strategy which is based on pipelining and data-flow techniques is presented. Timing equations are developed for calculating the performance of four join algorithms: nested block, hash, sort-merge, and pipelined sort-merge. They are used to execute the join operation in a query in distributed fashion and in pipelined fashion. Based on these equations and similar sets of equations developed for other relational algebraic operations, the performance of query execution was evaluated using the different join algorithms. The effects of varying the values of processing time, I/O time, communication time, buffer size, and join selectively on the performance of the pipelined join algorithms are investigated. The results are compared to the results obtained by employing the same algorithms for executing queries using the distributed processing approach which does not exploit the vertical concurrency of the pipelining approach. These results establish the benefits of pipelining  相似文献   

2.
针对云环境下空间数据连接查询处理问题,提出了一种基于Spark的多路空间连接查询处理算法BSMWSJ.该算法采用网格划分方法将整个数据空间划分成大小相同的网格单元,并将各类数据集中的空间对象,根据其空间位置划分到相应的网格单元中,不同网格单元中的空间数据对象进行并行连接查询处理.在多路空间连接查询处理过程中,采用边界过滤的方法来过滤无用数据,即通过计算前面连接操作候选结果的MBR来过滤后续连接数据集,从而过滤掉无用的连接对象,减少连接对象的多余投影与复制,并采用重复避免策略来减少重复结果的输出,从而进一步减少后续连接计算的代价.合成数据集和真实数据集上的大量实验结果表明:提出的多路空间连接查询处理算法在性能上明显优于现有的多路连接查询处理算法.  相似文献   

3.
Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.  相似文献   

4.
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.  相似文献   

5.
Management of large quantities of complex data is essential in many advanced application areas. Object-oriented (OO) database management system have been developed to effectively model and process the complex domain knowledge. They have been shown to outperform some existing relational systems. The existing implementations of OO database management systems attempt to improve the efficiency of OO queries by explicitly capturing the relationships among objects. However, the execution of complex queries involving the retrieval of objects from many classes and relationships among them causes the existing system to operate inefficiently. In this paper, we present parallel algorithms for the processing of queries against a large OO database. The algorithms are based on a closed model of query processing pattern-based access instead of the conventional value-based access. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the explicitly stored object associations. Generation of large quantities of temporary data is avoided by marking objects using their identifiers and by employing a two-phase query processing strategy. A query is processed by concurrent multiple waves, thereby improving parallelism avoiding the complexities introduced in their sequential implementation. The correctness and the performance of the parallel algorithms have been tested and analyzed by running parallel programs on a 32-node transputer based parallel machine designed and developed at the IBM Research Center at Yorktown Heights, New York. Benchmark queries of different semantic complexities are generated, and their performance is analyzed for various data and query parameters  相似文献   

6.
A multidatabase system (MDBS) allows the users to simultaneously access heterogeneous,and autonomous databases using an integrated schema and a single global query language. The query optimization problem in MDBSs is quite different from the query optimization problem in distributed homogeneous databases due to schema heterogeneity and autonomy of local database systems. In this work, we consider the optimization of query distribution in case of data replication and the optimization of intersite joins, that is, the join of the results returned by the local sitesin response to the global subqueries. The algorithms presented for the optimization of intersite joins try to maximize the parallelism in execution and take the federated nature of the problem into account. It has also been shown through a comparativeperformance study that the proposed intersite join optimization algorithms are efficient.The approach presented can easily be generalized to any operation required for intersite query processing. The query optimization scheme presentedin this paper is being implemented within the scopeof a multidatabase system which is based on OMG‘sobject management architecture.  相似文献   

7.
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.  相似文献   

8.
Current data repositories include a variety of data types, including audio, images, and time series. State-of-the-art techniques for indexing such data and doing query processing rely on a transformation of data elements into points in a multidimensional feature space. Indexing and query processing then take place in the feature space. We study algorithms for finding relationships among points in multidimensional feature spaces, specifically algorithms for multidimensional joins. Like joins of conventional relations, correlations between multidimensional feature spaces can offer valuable information about the data sets involved. We present several algorithmic paradigms for solving the multidimensional join problem and we discuss their features and limitations. We propose a generalization of the size separation spatial join algorithm, named multidimensional spatial join (MSJ), to solve the multidimensional join problem. We evaluate MSJ along with several other specific algorithms, comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets. Our experimental results indicate that MSJ, which is based on space filling curves, consistently yields good performance across a wide range of dimensionalities  相似文献   

9.
以目标节点为导向的XML路径查询处理   总被引:14,自引:4,他引:14  
王静  孟小峰  王宇  王珊 《软件学报》2005,16(5):827-837
XML查询语言将复杂路径表达式作为核心内容.为了加速路径表达式处理,基于路径分解和结构连接操作的处理策略需要更深入的研究.以目标节点为导向的XML路径查询处理框架被提了出来.该方法利用了扩展基本操作来减少连接操作的数目.在路径分解和查询计划选择的过程中,利用查询树中的目标节点来避免中间结果的传递.除了分解规则和策略以外,提出了一组扩展的基本操作和实现算法.初步的实验结果显示,该方法具有良好的性能.它为路径查询处理提供了更多的选择.  相似文献   

10.
一种新的基于划分的结构连接算法   总被引:2,自引:0,他引:2       下载免费PDF全文
有效的结构连接是XML查询处理的关键。目前,大部分结构连接算法由于需要临时排序、建立索引或存在数据复制及I/O问题,大大降低了执行效率。该文在分析比较现有结构连接算法的基础上,提出了一种新的基于划分的结构连接算法。该算法不需要排序或建立索引,通过栈的机制解决了数据复制问题,并充分考虑内存缓冲提高了I/O性能。实验分析表明该算法具有良好的查询性能。  相似文献   

11.
It is shown that the fragment and replicate (FR) distributed join algorithm is a special case of the symmetric fragment and replicate (SFR) algorithm, which improves the FR algorithm by reducing its communication. The SFR algorithm, like the FR algorithm, is applicable to N-way joins and nonequijoins and does tuple balancing automatically. The authors derive formulae that show how to minimize the communication in the SFR algorithm, discuss its performance on a parallel database prototype, and evaluate its practicality under various conditions. It is claimed that SFR improves the worst-case cost for a distributed join, but it will not displace specialized distributed join algorithms when the later are applicable  相似文献   

12.
Adaptive and incremental processing for distance join queries   总被引:1,自引:0,他引:1  
A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in online query processing or Internet search environments. These requirements pose new challenges as well as opportunities for more efficient processing of spatial distance join queries. In this paper, we first present an efficient k-distance join algorithm that uses spatial indexes such as R-trees. Bidirectional node expansion and plane-sweeping techniques are used for fast pruning of distant pairs, and the plane-sweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multistage algorithms for k-distance join and incremental distance join operations. Our performance study shows that the proposed adaptive multistage algorithms outperform previous work by up to an order of magnitude for both k-distance, join and incremental distance join queries, under various operational conditions.  相似文献   

13.
On-line analytical processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large, it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size (Chauduri and Dayal, 1997). Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. DataIndexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches.  相似文献   

14.
The performances of several algorithms suitable for processing an important class of recursive queries called the instantiated transitive closure (TC) queries are studied and compared. These algorithms are the wavefront, δ-wavefront, and a generic algorithm called super-TC. During the evaluation of a TC query, the first two algorithms may read a given disk page more than once, whereas super-TC reads the disk page at most once. A comprehensive performance evaluation of these three algorithms using rigorous analytical and simulation models is presented. The study reveals that the relative performance of the algorithms is a strong function of the parameters which characterize the processed TC query and the relation referenced by that query. The superiority of one of the super-TC variants over all of the other presented algorithms is shown  相似文献   

15.
Aiming at the problem of top-k spatial join query processing in cloud computing systems, a Spark-based top-k spatial join (STKSJ) query processing algorithm is proposed. In this algorithm, the whole data space is divided into grid cells of the same size by a grid partitioning method, and each spatial object in one data set is projected into a grid cell. The Minimum Bounding Rectangle (MBR) of all spatial objects in each grid cell is computed. The spatial objects overlapping with these MBRs in another spatial data set are replicated to the corresponding grid cells, thereby filtering out spatial objects for which there are no join results, thus reducing the cost of subsequent spatial join processing. An improved plane sweeping algorithm is also proposed that speeds up the scanning mode and applies threshold filtering, thus greatly reducing the communication and computation costs of intermediate join results in subsequent top-k aggregation operations. Experimental results on synthetic and real data sets show that the proposed algorithm has clear advantages, and better performance than existing top-k spatial join query processing algorithms.  相似文献   

16.
Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms  相似文献   

17.
陈刚  顾进广  李思川 《计算机科学》2010,37(12):143-144
数据流上的关系查询处理技术是数据库研究领域的一大热点。优化无阻塞连接算法的关键在于提高内存连接阶段的效率。当内存空间满时,需要将内存数据刷新到外存相应分区,良好的刷新策略对于改进算法的性能至关重要。利用数据分布的特征,对关系连接的输出流,使用基于统计的方法,查找使用频率最低的元组,将使用频率较低的元组刷新到外存,以提高内存数据的效率。基于统计分析策略提高了刷新策略的准确性和效率及算法的适用范围。  相似文献   

18.
基于区域划分的XML结构连接   总被引:22,自引:7,他引:22       下载免费PDF全文
王静  孟小峰  王珊 《软件学报》2004,15(5):720-729
结构连接是XML查询处理的核心操作,受到了研究界的关注.高效的算法是高效查询处理的关键.目前已经提出了许多结构连接的算法,它们中的大多数都基于如下的前提条件之一:输入元素集合存在索引或者有序.当这些条件不成立时,由于对输入数据临时排序或建索引的代价,这些算法的性能会大大下降.基于这样的观察,提出了一种基于区域划分的结构连接算法.该算法基于任务分解的思想,利用区域编码的特点对输入集合进行划分.给出了详细的算法设计,并对算法的I/O复杂性进行了分析.大量的实验结果显示,该算法具有良好的 性能,在输入数据无序或没有索引的情况下优于现有的排序合并算法,可以为查询计划提供更多的选择.  相似文献   

19.
《Information Sciences》2007,177(12):2493-2521
The query optimization phase in query processing plays a crucial role in choosing the most efficient strategy for executing a query. In this paper, we study an optimization technique for SQL-Nested queries using Hints. Hints are additional comments that are inserted into an SQL statement for the purpose of instructing the optimizer to perform the specified operations. We utilize various Hints including Optimizer Hints, Table join and anti-join Hints, and Access method Hints. We analyse the performance of various nested queries using the TRACE and TKPROF utilities which provide query execution statistics and execution plans.  相似文献   

20.
连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Hash连接算法。该算法通过多层划分的策略大大降低了主存访问次数,通过分区重排方法有效消除了数据倾斜的影响,获得了很高的性能。在异构融合阵列众核处理器DFMC(Deeply-Fused Many Core)原型系统上的实验结果表明,DFMC上多层分区Hash连接算法的性能是CPU-GPU耦合结构上最快的连接算法的8.0倍,表明利用阵列众核处理器加速数据查询应用具有优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号