共查询到20条相似文献,搜索用时 0 毫秒
1.
The author examines join processing when the access paths available are nonclustered indexes on the joining attribute(s) for both relations involved in the join. He uses a bipartite graph model to represent the pages from the two relations that contain tuples to be joined. The minimization of the number of page accesses needed to compute a join in the author's database environment is explored from two perspectives. The first is to reduce the maximum buffer size so that no page is accessed more than once, and the second is to reduce the number of page accesses for a fixed buffer size. The author has developed heuristics for these problems. He gives performance comparisons of these heuristics and another method that recently appeared in the literature. Results show that one particular heuristic performs very well for addressing the problem from either perspective 相似文献
2.
George Roumelis Antonio Corral Michael Vassilakopoulos Yannis Manolopoulos 《GeoInformatica》2016,20(4):571-628
Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases. 相似文献
3.
Halici U. Dogac A. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(7):712-724
A method called optimistic method with dummy locks (ODL) is suggested for concurrency control in distributed databases. It is shown that by using long-term dummy locks, the need for the information about the write sets of validated transactions is eliminated and, during the validation test, only the related sites are checked. The transactions to be aborted are immediately recognized before the validation test, reducing the costs of restarts. Usual read and write locks are used as short-term locks during the validation test. The use of short-term locks in the optimistic approach eliminates the need for the system-wide critical section and results in a distributed and parallel validation test. The performance of ODL is compared with strict two-phase locking (2PL) through simulation, and it is found out that for the low conflict cases they perform almost the same, but for the high conflicting cases, ODL performs better than strict 2PL 相似文献
4.
Range aggregate processing in spatial databases 总被引:3,自引:0,他引:3
A range aggregate query returns summarized information about the points falling in a hyper-rectangle (e.g., the total number of these points instead of their concrete ids). This paper studies spatial indexes that solve such queries efficiently and proposes the aggregate Point-tree (aP-tree), which achieves logarithmic cost to the data set cardinality (independently of the query size) for two-dimensional data. The aP-tree requires only small modifications to the popular multiversion structural framework and, thus, can be implemented and applied easily in practice. We also present models that accurately predict the space consumption and query cost of the aP-tree and are therefore suitable for query optimization. Extensive experiments confirm that the proposed methods are efficient and practical. 相似文献
5.
Object-based directional query processing in spatial databases 总被引:4,自引:0,他引:4
Xuan Liu Shekhar S. Chawla S. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(2):295-304
Direction-based spatial relationships are critical in many domains, including geographic information systems (GIS) and image interpretation. They are also frequently used as selection conditions in spatial queries. In this paper, we explore the processing of object-based direction queries and propose a new open shape-based strategy (OSS). OSS models the direction region as an open shape and converts the processing of the direction predicates into the processing of topological operations between open shapes and closed geometry objects. The proposed strategy OSS makes it unnecessary to know the boundary of the embedding world and also eliminates the computation related to the world boundary. OSS reduces both I/O and CPU costs by greatly improving the filtering effectiveness. Our experimental evaluation shows that OSS consistently outperforms classical range query strategies (RQS) while the degree of performance improvement varies by several parameters. Experimental results also demonstrate that OSS is more scalable than RQS for large data sets. 相似文献
6.
Spatial range query is one of the most common queries in spatial databases, where a user invokes a query to find all the surrounding interest objects. Most studies in range search consider Euclidean distances to retrieve the result in low cost, but with poor accuracy (i.e., Euclidean distance less than or equal network distance). Thus, researchers show that range search in network distance retrieves the results with high accuracy but with a vast amount of network distance computations. However, both of these techniques retrieve all objects in a given radius with a high number of false hits. Yet, in many situations, retrieving all objects is not necessary, especially when there are already enough objects closer to the query point. Also, when the radius of the search increases, a demotion in the performance will occur. Hence, approximate results are valuable just as the exact result, and approximate results can be obtained much faster than the exact result and are less costly. In this paper, we propose two approximate range search methods in spatial road network, namely approximate range Euclidean restriction and approximate range network expansion, to reduce the number of false hits and the number of network distance computations in a considerable manner. After the verification, these two methods are shown to be robust and accurate. 相似文献
7.
The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times. 相似文献
8.
This study is concerned with a parallel join operation where the subject relations are partitioned according to an interpolation based grid file (IBGF) scheme. The partitioned relations and directories are distributed over a set of independently accessible external storage units, together with the partitioning control data. The join algorithms executed by a mesh type parallel computing system allow handling of uniform as well as nonuniformly partitioned relations. Each processor locates and retrieves the data partitions it is to join at each step of the join process, in synchronisation with other processors. The approach is found to be feasible as the speedup and efficiency results found by simulation are consistent with theoretical bounds. The algorithms are tuned to join-key distributions, so that effective load balancing is achieved during the actual join. © 1997 John Wiley & Sons, Ltd. 相似文献
9.
Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396
In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献
10.
11.
Pramanik S. Vineyard D. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(9):1319-1326
A reduced cover set of the set of full reducer semijoin programs for an acyclic query graph for a distributed database system is given. An algorithm is presented that determines the minimum cost full reducer program. The computational complexity of finding the optimal full reducer for a single relation is of the same order as that of finding the optimal full reducer for all relations. The optimization algorithm is able to handle query graphs where more than one attribute is common between the relations. A method for determining the optimum profitable semijoin program is presented. A low-cost algorithm which determines a near-optimal profitable semijoin program is outlined. This is done by converting a semijoin program into a partial order graph. This graph also allows one to maximize the concurrent processing of the semijoins. It is shown that the minimum response time is given by the largest cost path of the partial order graph. This reducibility is used as a post optimizer for the SSD-1 query optimization algorithm. It is shown that the least upper bound on the length of any profitable semijoin program is N (N -1) for a query graph of N nodes 相似文献
12.
空间索引作为空间数据库的关键技术,其性能的高低决定着整个空间数据库的效率。通过对现有的多种空间索引结构进行比较分析,基于开源数据库Ingres实现了广度优先R树连接算法(BFRJ),并对其进行了局部优化和全局优化。基于真实数据的实验结果分析,证实了采用适当的全局优化方法的BFRJ优于其他已知的空间连接算法方法。 相似文献
13.
Slot index spatial join 总被引:3,自引:0,他引:3
Efficient processing of spatial joins is very important due to their high cost and frequent application in spatial databases and other areas involving multidimensional data. This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree. We explore two optimization techniques that reduce the space requirements and the computational cost of SISJ and we compare it, analytically and experimentally, with other spatial join methods for two cases: 1) when the nonindexed input is read from disk and 2) when it is an intermediate result of a preceding database operator in a complex query plan. The importance of buffer splitting between consecutive join operators is also demonstrated through a two-join case study and a method that estimates the optimal splitting is proposed. Our evaluation shows that SISJ outperforms alternative methods in most cases and is suitable for limited memory conditions. 相似文献
14.
Mohammad Sadoghi Kenneth A. Ross Mustafa Canim Bishwaranjan Bhattacharjee 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(5):651-672
Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques to reduce index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid-state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates and localizes maintenance to indexes on updated attributes. Additionally, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. In this work, we further exploit SSDs by introducing novel DeltaBlock techniques for storing the recent changes to data on SSDs. Using our DeltaBlock, we propose an efficient method to periodically flush the recently changed data from SSDs to HDDs such that, on the one hand, we keep track of every change (or delta) for every record, and, on the other hand, we avoid redundantly storing the unchanged portion of updated records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used generalized search tree open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost by a factor of 3. For the insertion of new records, our novel batching technique can save up to 90 % of the insertion time. For updates, our prototype demonstrates that we can significantly reduce the database size by up to 80 % even with a modest space allocated for DeltaBlocks on SSDs. 相似文献
15.
16.
Authenticated indexing for outsourced spatial databases 总被引:1,自引:0,他引:1
Yin Yang Stavros Papadopoulos Dimitris Papadias George Kollios 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):631-648
In spatial database outsourcing, a data owner delegates its data management tasks to a location-based service (LBS), which
indexes the data with an authenticated data structure (ADS). The LBS receives queries (ranges, nearest neighbors) originating
from several clients/subscribers. Each query initiates the computation of a verification object (VO) based on the ADS. The VO is returned to the client that can verify the result correctness using the public key of the owner. Our first contribution
is the MR-tree, a space-efficient ADS that supports fast query processing and verification. Our second contribution is the
MR*-tree, a modified version of the MR-tree, which significantly reduces the VO size through a novel embedding technique. Finally, whereas most ADSs must be constructed and maintained by the owner, we
outsource the MR- and MR*-tree construction and maintenance to the LBS, thus relieving the owner from this computationally
intensive task. 相似文献
17.
Indexes are a commonly used structure that provides fast access to the data. Their use imply storage and maintenance costs. This paper presents a technique to reduce index size, based on the elimination of tuple offsets in the classical B+ tree structure. It is shown that this technique gives advantages both in the tuple access and index maintenance. 相似文献
18.
This paper presents a query processing strategy for the content-based video query language named CVQL. By CVQL, users can flexibly specify query predicates by the spatial and temporal relationships of the content objects. The query processing strategy evaluates the predicates and returns qualified videos or frames as results. Before the evaluation of the predicates, a preprocessing is performed to avoid unnecessary accessing of videos which are impossible to be the answers. The preprocessing checks the existence of the content objects specified in the predicates to eliminate unqualified videos. For the evaluation of the predicates, an M-index is designed based on the analysis of the behaviors of the content objects. The M-index is employed to avoid frame-by-frame evaluation of the predicates. Experimental results are presented to illustrate the performance of this approach 相似文献
19.
Adaptive and incremental processing for distance join queries 总被引:1,自引:0,他引:1
Hyoseop Shin Bongki Moon Sukho Lee 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(6):1561-1578
A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in online query processing or Internet search environments. These requirements pose new challenges as well as opportunities for more efficient processing of spatial distance join queries. In this paper, we first present an efficient k-distance join algorithm that uses spatial indexes such as R-trees. Bidirectional node expansion and plane-sweeping techniques are used for fast pruning of distant pairs, and the plane-sweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multistage algorithms for k-distance join and incremental distance join operations. Our performance study shows that the proposed adaptive multistage algorithms outperform previous work by up to an order of magnitude for both k-distance, join and incremental distance join queries, under various operational conditions. 相似文献
20.
《Information and Software Technology》2007,49(4):332-344
In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN join between two datasets, R and S, returns for each point in R its K most similar points in S. We propose a new index-based KNN join approach using the iDistance as the underlying index structure. We first present its basic algorithm and then propose two different enhancements. In the first enhancement, we optimize the original KNN join algorithm by using approximation bounding cubes. In the second enhancement, we exploit the reduced dimensions of data space. We conducted an extensive experimental study using both synthetic and real datasets, and the results verify the performance advantage of our schemes over existing KNN join algorithms. 相似文献