期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Caching Strategies for Spatial Joins 总被引：1，自引：0，他引：1

David J. Abel Volker Gaede Robert A. Power Xiaofang Zhou 《GeoInformatica》1999,3(1):33-59

相似文献

2.

Dmitri V. Kalashnikov Sunil Prabhakar 《Information Systems》2007

The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join. 相似文献

3.

空间数据库中连接运算的处理与优化 总被引：7，自引：0，他引：7

下载免费PDF全文

李立言秦小麟《中国图象图形学报》2003,8(7):732-737

空间数据库的性能问题严重制约了它的应用与发展 .由于空间连接运算是空间数据库中最复杂、最耗时的基本操作 ,因此其处理效率在很大程度上决定了空间数据库的整体性能 .尽管目前已经有许多空间连接算法 ,但空间连接运算的代价估计和查询优化仍然有待进一步研究 .众所周知 ,大部分空间连接算法都是基于 R树索引实现的 ,如果参与空间连接运算的关系上没有索引或只有部分索引 ,那么就需要使用特殊的算法来处理 .另外 ,各种算法的代价评估模型需要一个相对统一的计算方法 ,实践证明 ,根据空间数据库的实际情况 ,使用 I/ O代价来估计算法的复杂性较为合理 .在此基础上 ,针对复杂的空间查询中可能出现多个关系参与空间连接运算的情况 ,故还需要合理地应用动态编程算法来找出代价最优的连接顺序 ,以便最终形成一个通用的算法框架 .通过对该算法框架的复杂性分析可以看出 ,在此基础上实现的空间数据库查询优化系统将具有较高的时空效率 ,并且能够处理非常复杂的空间查询相似文献

4.

Applying segmented right-deep trees to pipelining multiple hashjoins

Ming-Syan Chen Mingling Lo Yu P.S. Young H.C. 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(4):656-668

The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees 相似文献

5.

High-dimensional similarity joins 总被引：3，自引：0，他引：3

Shim K. Srikant R. Agrawal R. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(1):156-171

Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the ε tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to high-dimensional data. We analyze the cost of the join for the ε tree and the R-tree family, and show that the ε tree will perform better for high-dimensional joins. Empirical evaluation, using synthetic and real-life data sets, shows that similarity join using the ε tree is twice to an order of magnitude faster than the R⁺ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the ε tree can be applied to the R-tree family. These biased R-trees perform better than the corresponding traditional R-trees for high-dimensional similarity joins, but do not match the performance of the ε tree 相似文献

6.

Efficient join-index-based spatial-join processing: a clustering approach 总被引：2，自引：0，他引：2

Shekhar S. Chang-Tien Lu Chawla S. Ravada S. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(6):1400-1421

A join-index is a data structure used for processing join queries in databases. Join-indices use precomputation techniques to speed up online query processing and are useful for data sets which are updated infrequently. The I/O cost of join computation using a join-index with limited buffer space depends primarily on the page-access sequence used to fetch the pages of the base relations. Given a join-index, we introduce a suite of methods based on clustering to compute the joins. We derive upper bounds on the length of the page-access sequences. Experimental results with Sequoia 2000 data sets show that the clustering method outperforms existing methods based on sorting and online-clustering heuristics. 相似文献

7.

Slot index spatial join 总被引：3，自引：0，他引：3

Mamoulis N. Papadias D. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(1):211-231

Efficient processing of spatial joins is very important due to their high cost and frequent application in spatial databases and other areas involving multidimensional data. This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree. We explore two optimization techniques that reduce the space requirements and the computational cost of SISJ and we compare it, analytically and experimentally, with other spatial join methods for two cases: 1) when the nonindexed input is read from disk and 2) when it is an intermediate result of a preceding database operator in a complex query plan. The importance of buffer splitting between consecutive join operators is also demonstrated through a two-join case study and a method that estimates the optimal splitting is proposed. Our evaluation shows that SISJ outperforms alternative methods in most cases and is suitable for limited memory conditions. 相似文献

8.

混合散列连接算法随机I/O消除

刘明超杨良怀周为钢《计算机系统应用》2013,22(7):133-136

混合散列连接算法(HHJ)是数据库管理系统查询处理中一种重要的连接算法. 本文提出通过缓存优化来减少随机I/O的缓存优化混合散列连接算法(OHHJ), 即通过合理优化分区阶段桶缓存的大小来尽量减少分区过程中产生的随机I/O. 文章通过对分区(桶)大小、桶缓存大小、可用缓存大小、关系表大小与硬盘随机I/O访问特性之间的关系进行定量分析, 得出桶大小以及桶缓存大小最优分配的启发式. 实验结果表明OHHJ可以较好地减少传统HHJ算法分区阶段产生的随机I/O, 提升了算法性能. 相似文献

9.

An interactive framework for spatial joins: a statistical approach to data analysis in GIS

Shayma Alkobaisi Wan D. Bae Petr Vojtěchovsky Sada Narayanappa 《GeoInformatica》2012,16(2):329-355

Many Geographic Information Systems (GIS) handle a large volume of geospatial data. Spatial joins over two or more geospatial datasets are very common operations in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of datasets. In this paper, we propose an interactive framework that provides faster approximate answers of spatial joins. The proposed framework utilizes two statistical methods: probabilistic join and sampling based join. The probabilistic join method provides speedup of two orders of magnitude with no correctness guarantee, while the sampling based method provides an order of magnitude improvement over the full indexing tree joins of datasets and also provides running confidence intervals. The framework allows users to trade-off speed versus bounded accuracy, hence it provides truly interactive data exploration. The two methods are evaluated empirically with real and synthetic datasets. 相似文献

10.

Multi-way spatial join selectivity for the ring join graph

《Information and Software Technology》2005,47(12):785-795

Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that they compute the relationships (e.g. intersect) among the spatial data. Thus, accurate estimation of the spatial join selectivity is essential to generate an efficient spatial query execution plan that takes advantages of spatial access methods efficiently. For the multi-way spatial joins, the selectivity estimation formulae only for the two kinds of query types, tree and clique, have been developed. However, the selectivity estimation for the general query graph which contains cycles has not been developed yet. To fill this gap, we devise a formula for the multi-way spatial ring join selectivity. This is an indispensable step to compute the selectivity of the general multi-way spatial join whose join graph contains cycles. Our experiment shows that the estimated sizes of query results using our formula are close to the sizes of actual query results. 相似文献

11.

Performance analysis of three text-join algorithms

Meng W. Yu C. Wang W. Rishe N. 《Knowledge and Data Engineering, IEEE Transactions on》1998,10(3):477-492

When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed 相似文献

12.

Windowed pq-grams for approximate joins of data-centric XML 总被引：1，自引：0，他引：1

Nikolaus Augsten Michael B?hlen Curtis Dyreson Johann Gamper 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(4):463-488

In data integration applications, a join matches elements that are common to two data sources. Since elements are represented slightly different in each source, an approximate join must be used to do the matching. For XML data, most existing approximate join strategies are based on some ordered tree matching technique, such as the tree edit distance. In data-centric XML, however, the sibling order is irrelevant, and two elements should match even if their subelement order varies. Thus, approximate joins for data-centric XML must leverage unordered tree matching techniques. This is computationally hard since the algorithms cannot rely on a predefined sibling order. In this paper, we give a solution for approximate joins based on unordered tree matching. The core of our solution are windowed pq-grams which are small subtrees of a specific shape. We develop an efficient technique to generate windowed pq-grams in a three-step process: sort the tree, extend the sorted tree with dummy nodes, and decompose the extended tree into windowed pq-grams. The windowed pq-grams distance between two trees is the number of pq-grams that are in one tree decomposition only. We show that our distance is a pseudo-metric and empirically demonstrate that it effectively approximates the unordered tree edit distance. The approximate join using windowed pq-grams can be efficiently implemented as an equality join on strings, which avoids the costly computation of the distance between every pair of input trees. Experiments with synthetic and real world data confirm the analytic results and show the effectiveness and efficiency of our technique. 相似文献

13.

Using intrinsic data skew to improve hash join performance

Bryce Cutt Ramon Lawrence 《Information Systems》2009

Hash join is used to join large, unordered relations and operates independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve the performance of hash join. In this paper, we present histojoin, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histojoin performs significantly fewer I/O operations and is faster by 10–60% than hybrid hash join. 相似文献

14.

优化的XML查询匹配：基于B^＋-Tree索引的包含段的结构化联接算法 总被引：2，自引：0，他引：2

樊小华庞引明张谧汪卫陈金海施伯乐《计算机科学》2004,31(6):72-75

高效的结构化联接方法是XML查询的关键。本文提出一种新颖的结构化联接方法,使用了包含段结构化XML文档树,并且使用了B^ -Tree索引技术支持该新方法,从而在基于栈的结构化联接过程中得以忽略若干时空耗费,提高处理效率。相似文献

15.

分布式空间数据分片与跨边界拓扑连接优化方法 总被引：2，自引：0，他引：2

朱欣焰周春辉呙维夏宇《软件学报》2011,22(2):269-284

研究分布式空间数据库(distributed spatial database,简称DSDB)中数据按区域分片时的跨边界片段拓扑连接查询问题,并提出相应的优化方法.首先研究了分布式环境下的空间数据的分片与分布,提出了空间数据分片的扩展原则:空间聚集性、空间对象的不分割性、逻辑无缝保持性.然后,将区域分割分片环境下的片段连接分为跨边界和非跨边界两类;同时,将拓扑关系分为两类,重点研究跨边界的两类片段拓扑连接.提出了跨边界空间片段拓扑连接优化的两个定理,并给出了证明.以此为基础,给出了跨边界空间拓扑连接优化规则,包括连接去除规则和连接优化转化规则.最后设计了详细的实验,对自然连接策略、半连接策略以及所提出的连接策略进行效率比较,结果表明,所提出的方法对跨边界连接优化有明显优势.因此,所提出的理论和方法可以用于分布式跨边界拓扑关系查询的优化. 相似文献

16.

Practical methods for constructing suffix trees 总被引：7，自引：0，他引：7

Yuanyuan Tian Sandeep Tata Richard A. Hankins Jignesh M. Patel 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(3):281-299

Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized. In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n²) worst-case complexity outperforms popular linear time algorithms like Ukkonen and McCreight, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithm's performance. To address this problem, we describe two approaches. First, we present a buffer management strategy for the O(n²) algorithm. The resulting new algorithm, which we call “Top Down Disk-based” (TDD), scales to sizes much larger than have been previously described in literature. This approach far outperforms the best known disk-based construction methods. Second, we present a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and show that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD. 相似文献

17.

Indexing graph-structured XML data for efficient structural join operation

《Data & Knowledge Engineering》2006,58(2):159-179

Structural join has been established as a primitive technique for matching the binary containment pattern, specifically the parent–child and ancestor–descendant relationship, on the tree XML data. While current indexing approaches and evaluation algorithms proposed for the structural join operation assume the tree-structured data model, the presence of reference links in XML documents may render the underlying model a graph instead. In the more general category of semi-structured data, of which XML is an example, the data model is also usually supposed to be of graph structure. In this paper, we present an indexing approach and corresponding evaluation algorithms for efficiently performing the structural join operation on graph-structured data. Our approach encodes the structural containment relationship of a graph on multiple nested tree-structured layers, probably with the exception of the last one. With each tree-structured layer indexed with the inverted technique, the structural join operation on a graph can therefore be accomplished through recursively performing structural joins on nested layer trees. Our extensive experiments on both benchmark and synthetic XML data indicate that our proposed approach has good potential to perform significantly better than existing ones in term of both the I/O and CPU cost. 相似文献

18.

Seeking the truth about ad hoc join costs

Laura M. Haas Michael J. Carey Miron Livny Amit Shukla 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):241-256

In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems. Edited by M. Adiba. Received May 1993 / Accepted April 1996 相似文献

19.

Interleaving a join sequence with semijoins in distributed queryprocessing

Chen M.-S. Yu P.S. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(5):611-621

The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join sequence tree provides an efficient way to identify for each semijoin its correlated semijoins as well as its reducible relations under the join sequence. In light of these properties, an algorithm for determining an effective sequence of join and semijoin reducers is developed. Examples are given to illustrate the results. They show the advantage of using a combination of joins and semijoins as reducers for distributed query processing 相似文献

20.

Exploiting spatial indexes for semijoin-based join processing indistributed spatial databases

Kian-Lee Tan Beng Chin Ooi Abel D.J. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(6):920-937

In a distributed spatial database system, a user may issue a query that relates two spatial relations not stored at the same site. Because of the sheer volume and complexity of spatial data, spatial joins between two spatial relations at different sites are expensive in terms of computational and transmission costs. In this paper, we address the problems of processing spatial joins in a distributed environment. We propose a semijoin-like operator, called the spatial semijoin, to prune away objects that do not contribute to the join result. This operator also reduces both the transmission and local processing costs for a later join operation. However, the cost of the elimination process must be taken into account, and we consider approaches to minimize these overheads. We also study and compare two families of distributed join algorithms that are based on the spatial semijoin operator. The first is based on multi-dimensional approximations obtained from an index such as the R-tree, and the second is based on single-dimensional approximations obtained from object mapping. We have conducted experiments on real data sets and report the results in this paper 相似文献