首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The coterie join operation proposed by M.L. Neilsen and M. Mizuno (1994) produces, from a k-coterie and a coterie, a new k-coterie. For the coterie join operation, this paper first shows 1) a necessary and sufficient condition to produce a nondominated k-coterie (more accurately, a nondominated k-semicoterie satisfying nonintersection property) and 2) a sufficient condition to produce a k-coterie with higher availability. By recursively applying the coterie join operation in such a way that the above conditions hold, we define nondominated k-coteries, called tree structured k-coteries, the availabilities of which are thus expected to be very high. This paper then proposes a new k-mutual exclusion algorithm that effectively uses a tree structured k-coterie, by extending Agrawal and El Abbadi's tree algorithm. The number of messages necessary for k processes obeying the algorithm to simultaneously enter the critical section is approximately bounded by k log(n/k) in the best case, where n is the number of processes in the system  相似文献   

2.
由于嵌套循环连接操作过程中存在较大的高速缓存缺失,严重影响了连接查询的性能.提出了一种基于缓冲的高速缓存参数无关的嵌套循环并行连接算法.通过高速缓存参数无关和缓冲技术,提高了连接算法的空间局部性和时间局部性.理论分析和实验结果表明,高速缓存优化后的串行连接算法的性能是原来的2倍,其并行算法效果近似线性加速比.  相似文献   

3.
This paper presents a parallel distributive join algorithm for cube-connected multiprocessors. The performance analysis shows that the proposed algorithm has an almost linear speedup over the sequential distributive join algorithm as the number of processors increases, and its performance is comparable to that of the parallel hybrid-hash join algorithm. A big advantage of the proposed algorithm over hash-based join algorithms is that it does not have the bucket overflow problem caused by nonuniform hashing of the smaller operand relation. Moreover, the proposed algorithm can easily support the nonequijoin operation, which is very hard to implement by using hash-based join algorithms  相似文献   

4.
The cube adaptive-hash join algorithm, which combines the merits of nested-loop and hybrid-hash, is presented. The performance of these algorithms is compared through analytical cost modeling. The nonuniform data value distribution of the inner relation is shown to have a greater impact than that of the outer relation. The cube adaptive-hash algorithm outperforms the cube hybrid-hash algorithm when bucket overflow occurs. In the worst case, this algorithm converges to the cube nested-loop-hash algorithm. When there is no hash table overflow, the cube adaptive-hash algorithm converges to the cube hybrid-hash algorithm. Since the cube adaptive-hash algorithm adapts itself depending on the characteristics of the relations, it is relatively immune to the data distribution  相似文献   

5.
Presents a parallel hash join algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the join column, a join phase which is CPU-bound, and a shared nothing environment, the algorithm is shown to achieve good join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this algorithm is compared to some existing parallel hash join methods. The proposed method does considerably better in high skew situations  相似文献   

6.
分析了影响煤矿安全生产的因素,引入大数据技术对煤矿安全生产数据进行分析;提出了一种基于Bloom过滤器的星型连接算法,用于处理大数据分析过程中多表连接问题。试验结果表明,与传统算法相比,该算法能在空间和时间上提高采用大数据技术分析煤矿安全生产数据的效率。  相似文献   

7.
为有效实现XML文档查询,减少查询时结构连接的扫描代价,分析了基于归并思想的结构连接算法查询效率低的原因,充分利用XML数据的结构特点,提出了能够直接判断结点间结构关系的扩展Dewey编码,基于该编码的改进的Stack-Tree-Desc结构连接算法.应用扩展的Dewey编码,缩短了编码长度,降低了空间成本.改进的Stack-Tree-Desc算法引入二分查找快速跳过不需要参与连接的结点,减少了AList和DList列表中被扫描的结点数量,提高了查询效率.理论分析和实验结果表明了该编码方案以及结构连接算法的准确性和有效性.  相似文献   

8.
一种有效的并行数据库动态负载平衡连接算法   总被引:1,自引:0,他引:1  
在基于Shared-nothing结构的并行数据库中,负载平衡一直是影响查询处理性能的重要因素。在数据库中频繁使用的连接操作会因为各种因素导致的负载倾斜和额外的通讯开销而降低数据库的整体性能。提出了一种基于RCMD分布方法的动态负载平衡连接算法,能够在连接操作的执行过程中动态调整各个结点的负载。理论分析和实验结果证明提出的算法能够有效地平衡负载,提高并行数据库的执行效率。  相似文献   

9.
A novel algorithm for relational equijoin is presented. The algorithm is a modification of merge join, but promises superior performance for medium-size inputs. In many cases it even compares favorably with both merge join and hybrid hash join, which is shown using analytic cost functions  相似文献   

10.
11.
12.
String similarity join (SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics. The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.  相似文献   

13.
A parallel sort-merge-join algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors  相似文献   

14.
15.
基于关系数据库聚簇索引的逻辑和存储方式,提出在对象关系模型中遍历关联对象时缓存中间结果的连接算法,并给出了性能分析公式,通过计算和模拟实验证明了采用该方法查询关联对象时能减少磁盘的I/O次数和磁盘访问时间。  相似文献   

16.
Distributed database systems provide a new data processing and storage technology for decentralized organizations of today. Query optimization, the process to generate an optimal execution plan for the posed query, is more challenging in such systems due to the huge search space of alternative plans incurred by distribution. As finding an optimal execution plan is computationally intractable, using stochastic-based algorithms has drawn the attention of most researchers. In this paper, for the first time, a multi-colony ant algorithm is proposed for optimizing join queries in a distributed environment where relations can be replicated but not fragmented. In the proposed algorithm, four types of ants collaborate to create an execution plan. Hence, there are four ant colonies in each iteration. Each type of ant makes an important decision to find the optimal plan. In order to evaluate the quality of the generated plan, two cost models are used—one based on the total time and the other on the response time. The proposed algorithm is compared with two previous genetic-based algorithms on chain, tree and cyclic queries. The experimental results show that the proposed algorithm saves up to about 80 % of optimization time with no significant difference in the quality of generated plans compared with the best existing genetic-based algorithm.  相似文献   

17.
分析面向大数据平台的MapReduce分布式编程技术以及实现数据查询时的连接算法,针对SSB数据模型,提出基于分布式缓存的多表星型连接优化技术.利用谓词向量技术,将维表中间连接的数据依赖转化为表上的位图索引过滤,减少数据依赖产生的巨大网络开销;采用分布式缓存技术充分利用处理节点的内存,优化网络传输,减少查询代价.  相似文献   

18.
为了更加有效实现XML文档的结构查询,加强结构连接操作的效率,提出一种新结构连接算法.该算法采用扩展的前缀编码方案,在编码中增加了type、index等字段以利于定位树中结点在祖先结点列表或者后裔结点列表中的位置.该算法通过将XML文档树转换成左孩子右兄弟树,并定位树中一个祖先元素的起始点下标和终结点下标来找到该祖先元素的后裔结点列表.算法时间复杂度分析表明了该算法比现有算法的性能更好.  相似文献   

19.
Given two sets of moving objects with nonzero extents, the continuous intersection join query reports every pair of intersecting objects, one from each of the two moving object sets, for every timestamp. This type of queries is important for a number of applications, e.g., in the multi-billion dollar computer game industry, massively multiplayer online games like World of Warcraft need to monitor the intersection among players’ attack ranges and render players’ interaction in real time. The computational cost of a straightforward algorithm or an algorithm adapted from another query type is prohibitive, and answering the query in real time poses a great challenge. Those algorithms compute the query answer for either too long or too short a time interval, which results in either a very large computation cost per answer update or too frequent answer updates, respectively. This observation motivates us to optimize the query processing in the time dimension. In this study, we achieve this optimization by introducing the new concept of time-constrained (TC) processing. Further, TC processing enables a set of effective improvement techniques on traditional intersection join algorithms. Finally, we provide a method to find the optimal value for an important parameter required in our technique, the maximum update interval. As a result, we achieve a highly optimized algorithm for processing continuous intersection join queries on moving objects. With a thorough experimental study, we show that our algorithm outperforms the best adapted existing solution by several orders of magnitude. We also validate the accuracy of our cost model and its effectiveness in optimizing the performance.  相似文献   

20.
When two or more literals in the body of a Prolog clause are solved in (AND) parallel, their solutions need to bejoined to compute solutions for the clause. This is often a difficult problem in parallel Prolog systems that exploit OR and independent AND parallelism in Prolog programs. In several AND/OR parallel systems proposed recently, this problem is side-stepped at the cost of unexploited OR parallelism in the program, in part due to the complexity of the backtracking algorithm beneath AND parallel branches. In some cases, the data dependency graphs used by these systems cannot represent all the exploitable indenpendent AND parallelism known at compile time.In this paper, we describe the compile time analysis for an optimizedjoin algorithm for supporting independent AND parallelism in logic programs efficiently without leaving any OR parallelism unexploited. We then discuss how this analysis can be used to yield very efficient runtime behavior. We also discuss problems associated with a tree representation of the search space when arbitrarily complex data dependency graphs are permitted. We describe how these problems can be resolved by mapping the search space onto the data dependency graphs themselves. The algorithm has been implemented in a compiler for parallel Prolog based on the Reduce-OR process model. The algorithm is suitable for the implementation of AND/OR systems on both shared and nonshared memory machines. Performance on benchmark programs exhibiting AND and OR parallelism on one shared memory machine and one message passing machine is presented.This work was supported in part by NSF Grants CCR-87-00988 and CCR-89-02496.A shorter version of this paper appears in theProceedings of NACLP 1990.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号