首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 109 毫秒
1.
XML查询的结构连接算法   总被引:1,自引:0,他引:1  
针对目前多数XML结构连接方法在输入元素集合不存在索引或者无序的情况下,对输入数据临时排序或建立索引代价过高的问题,分析经典的Stack-Tree-Desc算法以及B 树索引的优化算法,提出不局限于外部索引结构的XML查询优化策略并给出算法实现.实验结果表明该算法较Stack-Tree-Desc算法查询效率更高.  相似文献   

2.
阳国贵  吴泉源 《计算机工程》2000,26(8):98-100,103
针对对象关系数据库中的连接运算,讨论了一种适合于对象关系数据库的新型索引结构-连接谓词索引,继而给出了基于该索引结构的连接算法,并分析了连接算法的性能,提出了根据性能计算来确定关系R和S中谁做为外关系,从而降低算法代价的方法。另外,给出的索引结构,算法思想及性能分析方法,也同样适用于多表连接。  相似文献   

3.
基于区域划分的XML结构连接   总被引:22,自引:7,他引:22       下载免费PDF全文
王静  孟小峰  王珊 《软件学报》2004,15(5):720-729
结构连接是XML查询处理的核心操作,受到了研究界的关注.高效的算法是高效查询处理的关键.目前已经提出了许多结构连接的算法,它们中的大多数都基于如下的前提条件之一:输入元素集合存在索引或者有序.当这些条件不成立时,由于对输入数据临时排序或建索引的代价,这些算法的性能会大大下降.基于这样的观察,提出了一种基于区域划分的结构连接算法.该算法基于任务分解的思想,利用区域编码的特点对输入集合进行划分.给出了详细的算法设计,并对算法的I/O复杂性进行了分析.大量的实验结果显示,该算法具有良好的 性能,在输入数据无序或没有索引的情况下优于现有的排序合并算法,可以为查询计划提供更多的选择.  相似文献   

4.
李英俊  宗金良  孙志胜 《计算机应用》2006,26(10):2405-2407
提出了EXN-Tree的概念,将XML文档树的节点映射到EXN-Tree,依据EXN-Tree的节点编码生成XML文档树节点数据结构。基于此新型的节点编码结构,就无序无索引节点集和有序有索引节点集两种情况下的XML结构连接算法展开研究,提出了一系列的结构连接算法,解决了无序无索引节点集和有序有索引节点集两种情况下的XML结构连接。分析表明该算法的I/O复杂性优于已有算法,具有良好的性能。  相似文献   

5.
Native XML数据库的快速查询,可以通过基于XML文档编码的结构连接算法实现。在对现有结构连接算法进行综述的前提下,提出一种新的Native XML数据库的结构连接算法——基于深度均匀划分的结构连接算法(DRIAM)。该算法不要求输入数据AList和DList有序或在其节点编码上建有索引,避免了排序和索引所增加的额外开销;不需要输入数据AList和Dlist全部加载到内存中,可以适应不同内存大小限制的情况,并且该算法时间复杂度非常低。  相似文献   

6.
基于二分法的XML结构连接   总被引:2,自引:0,他引:2       下载免费PDF全文
张晶  丁怡心  刘山 《计算机工程》2007,33(18):62-63,6
在XML数据的查询处理过程中,基于区域划分的连接算法在处理XML数据无序和不存在索引时,是一个效率较高的算法。该文利用区域编码的特点对输入集合进行穷尽的递归划分,在划分的代价下,逐步定位祖先-后代的结构关系。使用二分法进行划分后,再完成结构连接,提高了结构连接的效率,实验表明该算法在XML数据的查询处理上是一个有效的方法。  相似文献   

7.
王世卿  白林 《计算机工程与设计》2011,32(3):1108-1111,1137
为减少结构连接操作时对输入数据大小的依赖、在大部分实际情况下提高算法效率,研究了当前结构连接算法产生大量中间结果而导致过多连接次数的问题,提出一种新的基于结构摘要的索引技术。对路径树和XML树分别编码,使用少量预先计算的路径信息。在结构连接时执行交叉操作,该操作由位图在较短时间内实现,并且只返回结点所在路径的位置信息,减少了I/O次数。实验结果表明了其具有较高的查询效率,并且查询时间不依赖于输入数据的规模。  相似文献   

8.
结构连接作为XML查询的重要部分,对查询性能来说起着非常重要的作用.目前有几种结构连接算法已经被提出,例如Stack-Tree、XR-tree.这些算法主要集中在节点之间关系的确定上.与之不同,作者从分片的角度去解决结构连接问题,首先把节点间的关系引申到分片之间的关系,从而得出各分片之间的一些性质,再利用分片间的性质来提高结构连接操作的性能.文中提出了一种基于分片的结构连接算法和两种优化方法,实验表明该算法在性能上要优于Stack-Tree算法和XR-tree算法.设计了一个简单而又高效的索引结构来存储分片结果,实验结果表明该索引结构的维护代价要小于XR-tree的维护代价.  相似文献   

9.
基于区间编码的XML索引结构的有效结构连接   总被引:22,自引:1,他引:22  
该文给出了一个XML树数据模型的形式化定义.将编码方案、逆序列表和路径索引的思想相结合,提出了一种改进的XML数据的索引结构;给出了两个实现双亲/孩子关系和拥有关系的结构连接算法,它们最多只需要对参与连接的两个列表分别进行一次扫描,并且能够根据双亲结构信息等利用Bt树索引尽可能多地跳过不需要参与连接的元素结点.实验结果表明,该文给出的基于XML索引结构实现双亲/孩子关系和拥有关系的结构连接算法是高效的、健壮的.  相似文献   

10.
一种新的基于B~+树结构索引的XML元素的连接算法   总被引:1,自引:0,他引:1  
该文通过对传统的NumberingSchema进行改进,并结合B+树提出了一种新的索引———B+树结构索引。在B+树结构索引的基础上提出了一种有效的连接算法,该算法通过削减不参加连接的元素来实现快速、有效的连接。  相似文献   

11.
Fast joins using join indices   总被引:1,自引:0,他引:1  
Two new algorithms, “Jive join” and “Slam join,” are proposed for computing the join of two relations using a join index. The algorithms are duals: Jive join range-partitions input relation tuple ids and then processes each partition, while Slam join forms ordered runs of input relation tuple ids and then merges the results. Both algorithms make a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a temporary file, whose size is half that of the join index. Both algorithms require only that the number of blocks in main memory is of the order of the square root of the number of blocks in the smaller relation. By storing intermediate and final join results in a vertically partitioned fashion, our algorithms need to manipulate less data in memory at a given time than other algorithms. The algorithms are resistant to data skew and adaptive to memory fluctuations. Selection conditions can be incorporated into the algorithms. Using a detailed cost model, the algorithms are analyzed and compared with competing algorithms. For large input relations, our algorithms perform significantly better than Valduriez's algorithm, the TID join algorithm, and hash join algorithms. An experimental study is also conducted to validate the analytical results and to demonstrate the performance characteristics of each algorithm in practice. Received July 21, 1997 / Accepted June 8, 1998  相似文献   

12.
We propose a new algorithm, called Stripe-join, for performing a join given a join index. Stripe-join is inspired by an algorithm called ‘Jive-join’ developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe-join is particularly efficient for self-joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self-join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index.  相似文献   

13.
陈刚  顾进广  李思川 《计算机科学》2010,37(12):143-144
数据流上的关系查询处理技术是数据库研究领域的一大热点。优化无阻塞连接算法的关键在于提高内存连接阶段的效率。当内存空间满时,需要将内存数据刷新到外存相应分区,良好的刷新策略对于改进算法的性能至关重要。利用数据分布的特征,对关系连接的输出流,使用基于统计的方法,查找使用频率最低的元组,将使用频率较低的元组刷新到外存,以提高内存数据的效率。基于统计分析策略提高了刷新策略的准确性和效率及算法的适用范围。  相似文献   

14.
This paper presents a parallel distributive join algorithm for cube-connected multiprocessors. The performance analysis shows that the proposed algorithm has an almost linear speedup over the sequential distributive join algorithm as the number of processors increases, and its performance is comparable to that of the parallel hybrid-hash join algorithm. A big advantage of the proposed algorithm over hash-based join algorithms is that it does not have the bucket overflow problem caused by nonuniform hashing of the smaller operand relation. Moreover, the proposed algorithm can easily support the nonequijoin operation, which is very hard to implement by using hash-based join algorithms  相似文献   

15.
Rapid advances in semiconductor technology have made it possible to build massively parallel processors. In addition, optical 3D storage and optical interconnections open a new opportunity due to inherent massive parallelism and non-interference of light beams. The approaches used in current parallel database research cannot take advantage of massive parallelism which can be provided by the emerging technologies, due to speedup and scaleup limitations.

In this paper, we present a computational paradigm for database machines which takes advantage of the opening opportunity for massive parallelism and discuss the validity and feasibility of the paradigm. The approach we take is based on associative computing and fine grained data parallelism which allow unlimited speedup and scaleup. Additionally, an asymptotically fast data-parallel join algorithm, which can efficiently deal with the joins in which multiple relations share a common join field, is presented. The algorithm is based on parallel sorting and parallel binary search, and performs a multiway join in Os + Σ log r) where s is the cost of sorting an intermediate relation and r is the size of an input relation. The cost s of sorting is kept minimum by the algorithm.  相似文献   


16.
We address the problem of load shedding for continuous multi-way join queries over multiple data streams. When the arrival rates of tuples from data streams exceed the system capacity, a load shedding algorithm drops some subset of input tuples to avoid system overloads. To decide which tuples to drop among the input tuples, most existing load shedding algorithms determine the priority of each input tuple based on the frequency or some historical statistics of its join attribute value, and then drop tuples with the lowest priority. However, those value-based algorithms cannot determine the priorities of tuples properly in environments where join attribute values are unique and each join attribute value occurs at most once in each data stream. In this paper, we propose a load shedding algorithm specifically designed for such environments. The proposed load shedding algorithm determines the priority of each tuple based on the order of streams in which its join attribute value appears, rather than its join attribute value itself. Consequently, the priorities of tuples can be determined effectively in environments where join attribute values are unique and do not repeat. The experimental results show that the proposed algorithm outperforms the existing algorithms in such environments in terms of effectiveness and efficiency.  相似文献   

17.
一种新的优化串行直方图构造算法   总被引:1,自引:0,他引:1  
串行直方图是基于频度排列对关系进行优化分区构造而成的,其连接结果大小估计是最优的,并可用于等值和范围查询结果大小估计。但是,串行直方图的构造算法复杂,影响了实际应用。本文从实用的角度出发,设计了一种构造优化串行直方图的算法BOS,该算法的时间复杂度大大降低,且估计精度接近最优直方图,从而使其具有较高的实用价值。  相似文献   

18.
An important aspect of database processing in parallel computer systems is the use of data parallel algorithms. Several parallel algorithms for the relational database join operation in a hypercube multicomputer system are given. The join algorithms are classified as cycling or global partitioning based on the tuple distribution method employed. The various algorithms are compared under a common framework, using time complexity analysis as well as an implementation on a 64-node NCUBE hypercube system. In general, the global partitioning algorithms demonstrate better speedup. However, the cycling algorithm can perform better than the global algorithms in specific situations, viz., when the difference in input relation cardinalities is large and the hypercube dimension is small. The usefulness of the data redistribution operation in improving the performance of the join algorithms, in the presence of uneven data partitions, is examined. The results indicate that redistribution significantly decreases the join algorithm execution times for unbalanced partitions  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号