期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王国仁乔百友韩东红王斌《计算机学报》2008,31(1):77-90

结构连接作为XML查询的重要部分,对查询性能来说起着非常重要的作用.目前有几种结构连接算法已经被提出,例如Stack-Tree、XR-tree.这些算法主要集中在节点之间关系的确定上.与之不同,作者从分片的角度去解决结构连接问题,首先把节点间的关系引申到分片之间的关系,从而得出各分片之间的一些性质,再利用分片间的性质来提高结构连接操作的性能.文中提出了一种基于分片的结构连接算法和两种优化方法,实验表明该算法在性能上要优于Stack-Tree算法和XR-tree算法.设计了一个简单而又高效的索引结构来存储分片结果,实验结果表明该索引结构的维护代价要小于XR-tree的维护代价. 相似文献

2.

基于直方图的并行结构连接算法

李建新王国仁汤南王斌于亚新张海宁《计算机研究与发展》2004,41(10):1768-1773

连接操作是最昂贵且常用的数据库操作．在传统数据库系统中，主要的连接操作是等值连接操作，因此，传统的并行连接算法主要集中于并行等值连接操作．另外，随着XML在Web应用中变得越来越重要，XML已经成为Internet上一种新的数据交换标准．对XML数据的连接操作不同于传统数据库中的等值连接操作，它属于结构连接操作．以前适合等值连接操作的并行连接算法并不能有效地解决结构连接问题．因此，第1次提出了并行结构连接问题，并且通过应用直方图的思想于并行连接中，从而提出两种基本的并行XML结构连接算法、等高直方图连接算法和等宽直方图连接算法．实验表明这两种算法具有较好的性能．相似文献

3.

XPath的轴连接查询技术研究 总被引：3，自引：0，他引：3

王钊耿蓉王国仁《小型微型计算机系统》2005,26(11):1942-1947

XML查询处理比较流行的是解决祖先-后代、父亲-儿子关系的“结构连接”，其本身研究的是XPath中／和／／轴的查询，不能支持XPath各种轴的查询．故本文扩展了结构连接的含义，进一步提出轴连接查询的定义，同时，基于可支持XPath定位轴的RaP编码，设计了两种轴连接算法：RaPMerge和RaPOneJoin．并通过Shakespeare和XMark两个数据集对两种算法进行了对比测试，表明了RaPOneJoin的查询性能在XPath某些轴的查询上同RaPMerge相比有很大的性能优势．相似文献

4.

Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document

Hongzhi Wang Jianzhong Li Wei Wang Xuemin Lin 《World Wide Web》2008,11(4):485-510

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063. 相似文献

5.

Data Partitioning for Parallel Spatial Join Processing 总被引：1，自引：0，他引：1

Xiaofang Zhou David J. Abel David Truffet 《GeoInformatica》1998,2(2):175-204

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. In this paper we show that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms. 相似文献

6.

Join operations in temporal databases

Dengfeng Gao Christian S. Jensen Richard T. Snodgrass Michael D. Soo 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):2-29

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally varying data dramatically increases the size of a database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins.We address this need for efficient join evaluation in temporal databases. Our purpose is twofold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index-based join algorithms. Such algorithms do not rely on auxiliary access paths but may exploit sort orderings to achieve efficiency.Received: 17 October 2002, Accepted: 26 July 2003, Published online: 28 October 2003Edited by: T. Sellis 相似文献

7.

一种新的基于划分的结构连接算法 总被引：2，自引：0，他引：2

下载免费PDF全文

任家东尹晓鹏郭晓丹《计算机工程》2007,33(6):95-97

有效的结构连接是XML查询处理的关键。目前,大部分结构连接算法由于需要临时排序、建立索引或存在数据复制及I/O问题,大大降低了执行效率。该文在分析比较现有结构连接算法的基础上,提出了一种新的基于划分的结构连接算法。该算法不需要排序或建立索引,通过栈的机制解决了数据复制问题,并充分考虑内存缓冲提高了I/O性能。实验分析表明该算法具有良好的查询性能。相似文献

8.

对双亲／孩子结构连接算法的研究与改进

王治和谢斌《计算机科学》2008,35(1):126-127

结合区间编码和结点模型映射方法提出一种用于关系数据库的扩展存储模式.通过按广度优先遍历XML树实现对双亲/孩子关系结构连接算法的改进.改进后的算法降低了内存空间的开销,缩小了列表的扫描范围,明显提高了查找匹配速度,达到了查询优化的目的. 相似文献

9.

An Adaptive Parallel Distributive Join Algorithm on a Cluster of Workstations

Soon M. Chung Arindam Chatterjee 《The Journal of supercomputing》2002,21(1):5-35

In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [5]. The adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We implemented the original and adaptive parallel DJ algorithms on a network of Alpha workstations using the Parallel Virtual Machine (PVM). We analyzed the performance of the algorithms, and compared it with that of the parallel Hybrid-Hash (HH) join algorithms. Our results show that the parallel DJ algorithms perform comparably with the parallel HH join algorithms over the entire range of the number of processors used and for different join selectivities. A significant advantage of the parallel DJ algorithms is that they can easily support non-equijoin operations. 相似文献

10.

基于结构化联接的多版本XML文档查询处理 总被引：1，自引：0，他引：1

贾玉昌庞引明朱艳琴《计算机工程与应用》2005,41(36):172-174

结构连接是XML查询处理的核心操作,受到了研究界的关注。高效的算法是高效查询处理的关键。目前已经提出了许多结构连接的算法,但都不支持多版本的XML文档。文章对经典结构连接算法进行了扩充,使之支持多版本的XML文档。相似文献

11.

对左兄弟／右兄弟结构连接算法的研究与改进

王治和《计算机科学》2007,34(12):97-99

结合区间编码和结点模型映射方法提出一种用于关系数据库的扩展存储模式。通过按结点编码中的广度遍历序号建立聚集索引，实现左兄弟／右兄弟关系结构连接算法的改进。改进后的算法降低了内存空间的开销，缩小了列表的扫描范围，明显提高了查找匹配速度，达到了查询优化的目的。相似文献

12.

基于区域划分的XML结构连接 总被引：22，自引：7，他引：22

下载免费PDF全文

王静孟小峰王珊《软件学报》2004,15(5):720-729

结构连接是XML查询处理的核心操作,受到了研究界的关注.高效的算法是高效查询处理的关键.目前已经提出了许多结构连接的算法,它们中的大多数都基于如下的前提条件之一:输入元素集合存在索引或者有序.当这些条件不成立时，由于对输入数据临时排序或建索引的代价，这些算法的性能会大大下降.基于这样的观察，提出了一种基于区域划分的结构连接算法.该算法基于任务分解的思想，利用区域编码的特点对输入集合进行划分.给出了详细的算法设计，并对算法的I/O复杂性进行了分析.大量的实验结果显示，该算法具有良好的性能，在输入数据无序或没有索引的情况下优于现有的排序合并算法，可以为查询计划提供更多的选择. 相似文献

13.

基于数据流的k-近邻连接算法

王飞秦小麟刘亮沈尧《计算机科学》2015,42(5):204-210

k-近邻连接查询是空间数据库中一种常用的操作,该查询处理过程涉及连接和最近邻查询两个复杂操作.传统的集中式k-近邻连接查询算法已不能适应当前呈爆炸式增长的数据规模,设计分布式k-近邻连接查询算法成为了目前亟需解决的问题.现有的分布式k-近邻连接查询算法都包括了多轮串行的MapReduce任务,而每个MapReduce任务均需要读写分布式文件系统,导致MapReduce不能有效表达多个任务之间的依赖关系,因此算法效率低下.首先提出了一种基于数据流的计算框架,该框架建立在MapReduce之上,将数据处理过程按照数据流图建模.在该框架基础上,提出了一种高效的k-近邻连接算法,它利用空间填充曲线将多维数据映射为一维数据,从而将k-近邻连接查询转化为一维范围查询.实验结果表明,该算法的可扩展性较高,且效率比现有算法更优. 相似文献

14.

PEJA: Progressive Energy-Efficient Join Processing for Sensor Networks

下载免费PDF全文

Yong-Xuan Lai Yi-Long Chen and Hong Chen 《计算机科学技术学报》2008,23(6):957-972

Sensor networks are widely used in many applications to collaboratively collect information from the physical environment. In these applications,the exploration of the relationship and linkage of sensing data within multiple regions can be naturally expressed by joining tuples in these regions. However,the highly distributed and resource-constraint nature of the network makes join a challenging query. In this paper,we address the problem of processing join query among different regions progressively and energy-efficiently in sensor networks. The proposed algorithm PEJA(Progressive Energy-efficient Join Algorithm) adopts an event-driven strategy to output the joining results as soon as possible,and alleviates the storage shortage problem in the in-network nodes. It also installs filters in the joining regions to prune unmatchable tuples in the early processing phase,saving lots of unnecessary transmissions. Extensive experiments on both synthetic and real world data sets indicate that the PEJA scheme outperforms other join algorithms,and it is effective in reducing the number of transmissions and the delay of query results during the join processing. 相似文献

15.

基于CPU-GPU异构体系结构的并行字符串相似性连接方法

徐坤浩聂铁铮申德荣寇月于戈《计算机研究与发展》2021,58(3):598-608

相似性连接技术在数据清洗、数据集成等领域中具有重要意义,近年来引起了学术界的广泛关注.随着数据量的不断增大、数据处理实时性的要求逐渐提高以及处理器性能提升瓶颈的出现,传统的串行相似性连接方法已经不能满足当前大数据处理的需求.近些年,GPU作为协处理器在机器学习等领域取得了良好的加速效果,因此基于GPU的并行算法开始成为解决各类性能问题的有效解决方案.为此,提出了基于CPU-GPU异构体系的并行相似性连接方法.首先,方法使用GPU构建倒排索引,索引采用SoA(struct of arrays)结构,从而解决了传统索引结构在并行模式下读写效率低的问题.其次,针对串行算法的性能问题,提出基于过滤验证框架的并行双重长度过滤算法,其中利用前缀过滤和构建好的倒排索引提升过滤效果.方法中相似度精确计算验证过程使用CPU计算执行,从而充分利用CPU-GPU的异构计算资源.最后,在多个数据集上进行实验验证性能.通过与串行相似性连接算法进行对比,实验结果表明所提出方法相对于已有方法具有更好的过滤效果和更低的索引生成代价,并在相似性连接上具有更好的性能和良好的加速比. 相似文献

16.

MapReduce连接查询的I/O代价研究

宋杰李甜甜朱志良鲍玉斌于戈《软件学报》2015,26(6):1438-1456

数据的指数级增长给数据管理和分析带来了严峻的挑战.连接查询是数据分析中一种常用运算,而MapReduce是一种用于大规模数据集并行处理的编程模型,研究基于MapReduce的连接查询代价评估和查询优化,有着学术意义和应用价值.MapReduce连接查询算法的性能主要取决于I/O代价(包括本地和网络I/O),而I/O代价与数据集以及连接运算的特征参数相关,通过对二元连接的I/O代价评估可以优化多元连接执行计划.基于此,首先提出了二元连接查询的I/O代价模型;随后,对现有二元连接算法进行形式化定义和简单扩展,归纳出6种基于MapReduce连接查询算法,并通过算法白盒分析定义它们的I/O代价函数;最后,提出一种多元连接最优执行计划的选择算法.通过实验表明I/O代价模型的正确性且能够准确地反映算法的性能优劣. 相似文献

17.

XML查询结构连接顺序选择算法分析与优化

张艺濒谢金晶《微机发展》2007,17(1):82-84

如今对XML查询的优化是对XML的热点研究方向。其中的结构连接操作是XML数据库查询的主要操作。和关系数据库中的连接运算一样,结构连接顺序的选择是XML数据库查询优化的核心。文中主要通过对XML查询优化中各种选择连接顺序算法的研究,提出了一种优化的算法,在规模较大的XML查询中能够有效缩减搜索空间,提高效率。相似文献

18.

基于二分法的XML结构连接 总被引：2，自引：0，他引：2

下载免费PDF全文

张晶丁怡心刘山《计算机工程》2007,33(18):62-63,6

在XML数据的查询处理过程中,基于区域划分的连接算法在处理XML数据无序和不存在索引时,是一个效率较高的算法。该文利用区域编码的特点对输入集合进行穷尽的递归划分,在划分的代价下,逐步定位祖先-后代的结构关系。使用二分法进行划分后,再完成结构连接,提高了结构连接的效率,实验表明该算法在XML数据的查询处理上是一个有效的方法。相似文献

19.

基于聚簇的XML文档近似连接方法

韩哲王宏志高宏李建中骆吉洲《计算机研究与发展》2009,46(Z2)

XML文档近似连接操作是在两个XML文档集合中发现近似的XML文档,其在基于XML数据的信息集成、XML数据清洗等系统中有着广泛的应用.然而,目前XML文档近似连接操作的一个显著问题在于:当文档之间存在较大差异时,存在大量的重复计算,降低了处理效率.对于这个问题,提出了基于聚类的XML文档近似连接方法,基本思想是为每个XML文档建立一个索引,如果两个数据集中若干文档的索引较相似,可以把它们组成一簇,然后在每一簇中执行近似连接.而不在任何簇中的文档,则无需对其进行任何计算.实验结果表明,提出的方法在保证正确率的前提下具有高效性. 相似文献

20.

分布式空间数据分片与跨边界拓扑连接优化方法 总被引：2，自引：0，他引：2

朱欣焰周春辉呙维夏宇《软件学报》2011,22(2):269-284

研究分布式空间数据库(distributed spatial database,简称DSDB)中数据按区域分片时的跨边界片段拓扑连接查询问题,并提出相应的优化方法.首先研究了分布式环境下的空间数据的分片与分布,提出了空间数据分片的扩展原则:空间聚集性、空间对象的不分割性、逻辑无缝保持性.然后,将区域分割分片环境下的片段连接分为跨边界和非跨边界两类;同时,将拓扑关系分为两类,重点研究跨边界的两类片段拓扑连接.提出了跨边界空间片段拓扑连接优化的两个定理,并给出了证明.以此为基础,给出了跨边界空间拓扑连接优化规则,包括连接去除规则和连接优化转化规则.最后设计了详细的实验,对自然连接策略、半连接策略以及所提出的连接策略进行效率比较,结果表明,所提出的方法对跨边界连接优化有明显优势.因此,所提出的理论和方法可以用于分布式跨边界拓扑关系查询的优化. 相似文献