共查询到19条相似文献,搜索用时 156 毫秒
1.
本文在并行join法ABJ^+的基础上提出一个基于Semijoin的改进算法SBABJ。我们在多台Sun工作站是实现了该算法,并对ABJ^+和SBABJ^+进行了能测试。算法分析和实验结果表明了并行join算法SBABJ^+优于ABJ^+算法。 相似文献
2.
并行数据库上的进行CMD—Join算法 总被引:1,自引:1,他引:1
并行数据库在多处理机之间的分布方法对并行数据 算法的性能影响很大,如果在设计并行数据操作算法时充分利用数据分布方法的特点,可以得到十分有效的并行算法。本研究如何充分利用数据分布方法的特点,设计并行数据操作算法的问题,提出了基CMD多维数据分布方法的并行CMD-Join算法,理论分析和实验结果表明,并行CMD-Join算法的效率高于其它并行Join算法。 相似文献
3.
查询优化不仅是顺序数据库系统的重要组成部分,也是并行数据库的重要组成部分,而多个Join操作的复杂关系数据库的查询经又是目前研究的 主要课题。 相似文献
4.
并行数据库的改进Hash划分方法及并行Join算法 总被引:3,自引:0,他引:3
文中提出了Hash划分的改进方法--IH划分,IH划分为结点扩充时数据的重新划分提供了方便,在论述IH划分的基础上,给出了基于该数据划人垢并行Join算法,利用已有数据分布,文中提出的并行Join算法提高算法的效率。最后,从理论上对以上并行算法的计算复杂性进行了分析。 相似文献
5.
物理设计方法是并行数据库系统研究和实现中的一个核心问题,本文介绍我们在一个并行数据库原型系统中采用的数据库物理设计方法,描述其数据划分,并行B树索引结构和几种复杂并行数据组织形式。 相似文献
6.
本文是并行数据库的查询处理并行化技术和物理设计方法”一文的续篇,继续综述并行数据库系统的另外两个重要研究领域:并行数据操作算法和并行数据库查询优化技术.最后,作为并行数据库系统研究与进展情况综述的结尾,本文将探讨并行数据库系统今后的研究方向和问题. 相似文献
7.
提出一种新的并行数据库系统的实现模型,称为“半重写变换”模型。基于该模型提出了一种并行数据库系统的结构。这一结构由多个DBMS Instances和并行查询服务器(PQS)组成。文中首先描述了“半重写变换”模型,然后描述了基于这个模型实现的一个并行查询原型系统ParaBase,最后给出基于Wisonsin Benchmark的一组性能测试结果。 相似文献
8.
本文心细介绍了基于无共享并行结构的并行嵌套环连接(PNLJ)算法的实现,旨在探索并行数据库系统的实现技术。 相似文献
9.
10.
11.
连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Hash连接算法。该算法通过多层划分的策略大大降低了主存访问次数,通过分区重排方法有效消除了数据倾斜的影响,获得了很高的性能。在异构融合阵列众核处理器DFMC(Deeply-Fused Many Core)原型系统上的实验结果表明,DFMC上多层分区Hash连接算法的性能是CPU-GPU耦合结构上最快的连接算法的8.0倍,表明利用阵列众核处理器加速数据查询应用具有优势。 相似文献
12.
13.
提出了一种基干改进的B 树结构及一种新的数据挖掘算法,HB-Minc,该算法通过构造哈希函数,获得B 树的关键字,并在B 树的叶子结点上构建链表结构,记录卡H关关键字的项集及频数,这样在无需产生巨大的候选项集的情况下,挖掘出频繁模式,且具有较高的时间效率。 相似文献
14.
通过分析ABJ 算法和Hybrid hash join算法,并对两个算法进行了结合和改进,提出了一种能克服各种数据偏斜的并行二元连接运算算法,可在不同的数据偏斜情况下启动不同的模块,克服数据偏斜造成的负载不平衡现象。 相似文献
15.
通过分析ABJ+算法和Hybrid hash join算法,并对两个算法进行了结合和改进,提出了一种能克服各种数据偏斜的并行二元连接运算算法,可在不同的数据偏斜情况下启动不同的模块,克服数据偏斜造成的负载不平衡现象。 相似文献
16.
Spatial join processing using corner transformation 总被引:1,自引:0,他引:1
Ju-Won Song Kyu-Young Whang Young-Koo Lee Min-Jae Lee Sang-WookKim 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(4):688-695
Spatial join finds pairs of spatial objects having a specific spatial relationship in spatial database systems. Since spatial join is a fairly expensive operation, we need an efficient algorithm taking advantage of the characteristics of available spatial access methods. In this paper, we propose a spatial join algorithm using corner transformation and show its excellence through experiments. To the extent of authors' knowledge, the spatial join processing using corner transformation is new. In corner transformation, two regions in one file joined with two adjacent regions in the other file share a large common area. The proposed algorithm utilizes this property in order to reduce the number of disk accesses for spatial join. Experimental results show that the performance of the algorithm is generally better than that of the R*-tree based algorithm proposed by Brinkhoff et al. (1993. 1994). This is a strong indication that corner transformation is a promising category of spatial access methods and that spatial operations can be performed better in the transform space than in the original space. This reverses the common belief that transformation will adversely effect the clustering. We also briefly mention that the join algorithm based on corner transformation has a nice property of being amenable to parallel processing. We believe that our result will provide a new insight towards transformation-based processing of spatial operations 相似文献
17.
18.
Parallel Algorithms for Discovery of Association Rules 总被引:2,自引:0,他引:2
Mohammed J. Zaki Srinivasan Parthasarathy Mitsunori Ogihara Wei Li 《Data mining and knowledge discovery》1997,1(4):343-373
Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed
in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine
the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms
perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost.
In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques
to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use
of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes
based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up
and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively
replicated so that the portion of the database needed for the computation of associations is local to each processor. After
the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize
I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset
clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets
and do not have to maintain or search complex hash structures.
Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results
on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best
new algorithm outperforms it by an order of magnitude. 相似文献
19.
《Information and Software Technology》2001,43(11):661-677
Three information retrieval storage structures are considered to determine their suitability for a World Wide Web search engine: The Wolverhampton Web Library — The Next Generation. The structures are an inverted file, signature file and Pat tree. A number of implementations are considered for each structure. For the index of an inverted file a sorted array, B-tree, B+-tree, trie and hash table are considered. For the signature file vertical and horizontal partitioning schemes are considered and for the Pat tree a tree and array implementation are considered. A theoretical comparison of the structures is done on seven criteria that include: response time, support for results ranking, search techniques, file maintenance, efficient use of disk space (including the use of compression), scalability and extensibility. The comparison reveals that an inverted file is the most suitable structure, unlike the signature file and Pat tree, which encounter problems with very large corpora. 相似文献