首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
A spatial join is a query that searches for a set of object pairs satisfying a given spatial relationship from a database. It is one of the most costly queries, and thus requires an efficient processing algorithm that fully exploits the features of the underlying spatial indexes. In our earlier work, we devised a fairly effective algorithm for processing spatial joins with double transformation (DOT) indexing, which is one of several spatial indexing schemes. However, the algorithm is restricted to only the one-dimensional cases. In this paper, we extend the algorithm for the two-dimensional cases, which are general in Geographic Information Systems (GIS) applications. We first extend DOT to two-dimensional original space. Next, we propose an efficient algorithm for processing range queries using extended DOT. This algorithm employs the quarter division technique and the tri-quarter division technique devised by analyzing the regularity of the space-filling curve used in DOT. This greatly reduces the number of space transformation operations. We then propose a novel spatial join algorithm based on this range query processing algorithm. In processing a spatial join, we determine the access order of disk pages so that we can minimize the number of disk accesses. We show the superiority of the proposed method by extensive experiments using data sets of various distributions and sizes. The experimental results reveal that the proposed method improves the performance of spatial join processing up to three times in comparison with the widely-used R-tree-based spatial join method.  相似文献   

We investigate the problem of processing historical queries on a sensor network. Since data is considered to have been already collected at the sensor nodes, the main issue is exploring the spatial component of the query in order to minimize its cost represented by the energy consumption. We assume queries can be issued at any network node, i.e., there is no central base station and all nodes have only local knowledge of the network. On the one hand, a globally optimum query processing plan is desirable but its construction is not possible due to the lack of global knowledge of the network. On the other hand, while a simple network flooding is feasible, it is not a practical choice from a cost perspective. To address this problem we propose a two-phase query processing strategy, where in the first phase a path from the query originator to the query region is found and in the second phase the query is processed within the query region itself. This strategy is supported by analytical models that are used to dynamically select the best processing strategy depending on the query specifics. Our extensive analytical and experimental results show that our analytical models are accurate and that the two-phase strategy is better suited for small to medium sized queries, being up to 10 times more cost effective than a typical network flooding. In addition, the dynamic selection of a query processing technique proved itself capable of always delivering at least as good performance as the most energy efficient strategy for all query sizes. Research supported in part by NSERC Canada.  相似文献   

Many recent image retrieval methods are based on the “bag-of-words” (BoW) model with some additional spatial consistency checking. This paper proposes a more accurate similarity measurement that takes into account spatial layout of visual words in an offline manner. The similarity measurement is embedded in the standard pipeline of the BoW model, and improves two features of the model: i) latent visual words are added to a query based on spatial co-occurrence, to improve query recall; and ii) weights of reliable visual words are increased to improve the precision. The combination of these methods leads to a more accurate measurement of image similarity. This is similar in concept to the combination of query expansion and spatial verification, but does not require query time processing, which is too expensive to apply to full list of ranked results. Experimental results demonstrate the effectiveness of our proposed method on three public datasets.  相似文献   

Given the source and destination locations of n group members and a set of required point of interest (POI) types such as restaurants and shopping centers, a Group Trip Scheduling (GTS) query schedules n individual trips such that each POI type is included in exactly one trip and an aggregate trip overhead distance for visiting the required POI types is minimized. Each trip starts at a member’s source location, goes through some POIs, and ends at the member’s destination location. The trip distance of a group member is the distance from her source to destination via the POIs that the group member visits, and the trip overhead distance of the group member is measured by subtracting the distance between her source and destination locations (without visiting any POI type) from her trip distance. The aggregate trip overhead distance is either the summation or the maximum of the trip overhead distances of the group members for visiting the POIs. A GTS query enables a group to schedule independent trips for its members in order to perform a set of tasks with the minimum travel cost. For example, family members normally have many outdoor tasks to perform within a short time for the proper management of home. The members may need to go to a bank to withdraw or deposit money, a pharmacy to buy medicine, or a supermarket to buy groceries. Similarly, organizers of an event may need to visit different POI types to perform many tasks. These scenarios motivate us to introduce a GTS query, a novel query type in spatial databases. We develop an efficient approach to process GTS queries and variants for the Euclidean space and road networks. By exploiting geometric properties, we refine the POI search space and prune POIs, which in turn reduce the query processing overhead significantly. In addition, we propose a dynamic programming technique to eliminate the trip combinations that cannot be part of the optimal query answer. We show that processing a GTS query is NP-hard and propose an approximation algorithm to further reduce the query processing overhead. We perform extensive experiments using real and synthetic datasets and show that our approach outperforms a straightforward approach with a large margin.  相似文献   

Spatial indexing on flash-based Solid State Drives (SSDs) has become a core aspect in spatial database applications, and has been carried out by flash-aware spatial indices. Although there are some flash-aware spatial indices proposed in the literature, they do not exploit all the benefits of SSDs, leading to loss of efficiency and durability. In this article, we propose eFIND, a new generic and efficient framework for flash-aware spatial indexing. eFIND takes into account the intrinsic characteristics of SSDs by employing (i) a write buffer to avoid expensive random writes, (ii) a flushing algorithm that smartly picks modifications to be flushed in batch to the SSD, (iii) a read buffer to decrease the overhead of random reads, (iv) a temporal control to avoid interleaved reads and writes, and (v) a log-structured approach to provide data durability. Performance tests showed the efficiency of eFIND. Compared to the state of the art, eFIND improved the construction of spatial indices from 43% to 77%, and the spatial query processing from 4% to 23%.  相似文献   

This work studies the quantum query complexity of Boolean functions in an unbounded-error scenario where it is only required that the query algorithm succeeds with a probability strictly greater than 1/2. We show that, just as in the communication complexity model, the unbounded-error quantum query complexity is exactly half of its classical counterpart for any (partial or total) Boolean function. Moreover, connecting the query and communication complexity results, we show that the “black-box” approach to convert quantum query algorithms into communication protocols by Buhrman-Cleve—Wigderson [STOC’98] is optimal even in the unbounded-error setting.We also study a related setting, called the weakly unbounded-error setting, where the cost of a query algorithm is given by q+log(1/2(p−1/2)), where q is the number of queries made and p>1/2 is the success probability of the algorithm. In contrast to the case of communication complexity, we show a tight multiplicative Θ(logn) separation between quantum and classical query complexity in this setting for a partial Boolean function. The asymptotic equivalence between them is also shown for some well-studied total Boolean functions.  相似文献   

Many geographical applications have to deal with spatial objects that reveal an intrinsically vague or fuzzy nature. A spatial object is fuzzy if locations exist that cannot be assigned completely to the object or to its complement. Spatial database systems and Geographical Information Systems (GIS) are currently unable to cope with this kind of data. Based on an available abstract data model of fuzzy spatial data types for fuzzy points, fuzzy lines, and fuzzy regions that leverages fuzzy set theory and fuzzy point set topology, this article proposes a Spatial Plateau Algebra that provides spatial plateau data types as an implementation of fuzzy spatial data types. Each spatial plateau object consists of a finite number of crisp counterparts that are all adjacent or disjoint to each other, are associated with different membership values, and hence form different plateaus. The formal framework and the implementation are based on well known, exact models and implementations of crisp spatial data types. Spatial plateau operations as geometric operations on spatial plateau objects are expressed as a combination of geometric operations on the underlying crisp spatial objects. This article offers a conceptually clean foundation for implementing a database extension for fuzzy spatial objects and their operations, and demonstrates the embedding of these new data types as attribute data types in a database schema as well as the incorporation of fuzzy spatial operations into a database query language.  相似文献   

A reverse k-nearest neighbor (RkNN) query retrieves the data points which regard the query point as one of their respective k nearest neighbors. A bi-chromatic reverse k-nearest neighbor (BRkNN) query is a variant of the RkNN query, considering two types of data. Given two types of data G and C, a BRkNN query regarding a data point q in G retrieves the data points from C that regard q as one of their respective k-nearest neighbors among the data points in G. Many existing approaches answer either the RkNN query or the BRkNN query. Different from these approaches, in this paper, we make the first attempt to propose a top-n query based on the concept of BRkNN queries, which ranks the data points in G and retrieves the top-n points according to the cardinalities of the corresponding BRkNN answer sets. For efficiently answering this top-n query, we construct the Voronoi Diagram of G to index the data points in G and C. From the information associated with the Voronoi Diagram of G, the upper bound of the cardinality of the BRkNN answer sets for each data point in G can be quickly computed. Moreover, based on an existing approach to answering the RkNN query and the characteristics of the Voronoi Diagram of G, we propose a method to find the candidate region regarding a BRkNN query, which tightens the corresponding search space. Finally, based on the triangle inequality, we propose an efficient refinement algorithm for finding the exact BRkNN answers from the candidate regions. To evaluate our approach on answering the top-n query, it is compared with an approach which applies a state-of-the-art algorithm for answering the BRkNN query to each data point in G. The experiment results reveal that our approach has a much better performance.  相似文献   

Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical read-heavy workloads to transactional (OLTP) workloads. For both the service providers and the users, it is critical to minimize the consumption of resources like CPU, memory, communication bandwidth, and energy, without compromising on service-level agreements if any. In this article, we develop a workload-aware data placement and replication approach, called SWORD, for minimizing resource consumption in such an environment. Specifically, we monitor and model the expected workload as a hypergraph and develop partitioning techniques that minimize the average query span, i.e., the average number of machines involved in the execution of a query or a transaction. We empirically justify the use of query span as the metric to optimize, for both analytical and transactional workloads, and develop a series of replication and data placement algorithms by drawing connections to several well-studied graph theoretic concepts. We introduce a suite of novel techniques to achieve high scalability by reducing the overhead of partitioning and query routing. To deal with workload changes, we propose an incremental repartitioning technique that modifies data placement in small steps without resorting to complete repartitioning. We propose the use of fine-grained quorums defined at the level of groups of data items to control the cost of distributed updates, improve throughput, and adapt to different workloads. We empirically illustrate the benefits of our approach through a comprehensive experimental evaluation for two classes of workloads. For analytical read-only workloads, we show that our techniques result in significant reduction in total resource consumption. For OLTP workloads, we show that our approach improves transaction latencies and overall throughput by minimizing the number of distributed transactions.  相似文献   

Wireless sensor networks are powerful, distributed, self-organizing systems used for event and environmental monitoring. In-network query processors like TinyDB offer a user friendly SQL-like application development. Due to the sensor nodes?? resource limitations, monolithic approaches often support only a restricted number of operators. For this reason, complex processing is typically outsourced to the base station. Nevertheless, previous work has shown that complete or partial in-network processing can be more efficient than the base station approach. In this paper, we introduce AnduIN, a system for developing, deploying, and running complex in-network processing tasks. In particular, we present the query planning and execution strategies used in AnduIN, a system combining sensor-local in-network processing and a data stream engine. Query planning employs a multi-dimensional cost model taking energy consumption into account and decides autonomously which query parts will be processed within the sensor network and which parts will be processed at the central instance.  相似文献   

The performance optimization of query processing in spatial networks focuses on minimizing network data accesses and the cost of network distance calculations. This paper proposes algorithms for network k-NN queries, range queries, closest-pair queries and multi-source skyline queries based on a novel processing framework, namely, incremental lower bound constraint. By giving high processing priority to the query associated data points and utilizing the incremental nature of the lower bound, the performance of our algorithms is better optimized in contrast to the corresponding algorithms based on known framework incremental Euclidean restriction and incremental network expansion. More importantly, the proposed algorithms are proven to be instance optimal among classes of algorithms. Through experiments on real road network datasets, the superiority of the proposed algorithms is demonstrated.  相似文献   

Aiming at the problem of top-k spatial join query processing in cloud computing systems, a Spark-based top-k spatial join (STKSJ) query processing algorithm is proposed. In this algorithm, the whole data space is divided into grid cells of the same size by a grid partitioning method, and each spatial object in one data set is projected into a grid cell. The Minimum Bounding Rectangle (MBR) of all spatial objects in each grid cell is computed. The spatial objects overlapping with these MBRs in another spatial data set are replicated to the corresponding grid cells, thereby filtering out spatial objects for which there are no join results, thus reducing the cost of subsequent spatial join processing. An improved plane sweeping algorithm is also proposed that speeds up the scanning mode and applies threshold filtering, thus greatly reducing the communication and computation costs of intermediate join results in subsequent top-k aggregation operations. Experimental results on synthetic and real data sets show that the proposed algorithm has clear advantages, and better performance than existing top-k spatial join query processing algorithms.  相似文献   

Data warehouse workloads are crucial for the support of on-line analytical processing (OLAP). The strategy to cope with OLAP queries on such huge amounts of data calls for the use of large parallel computers. The trend today is to use cluster architectures that show a reasonable balance between cost and performance. In such cases, it is necessary to tune the applications in order to minimize the amount of I/O and communication, such that the global execution time is reduced as much as possible.In this paper, we model and analyze the most up-to-date strategies for ad hoc star join query processing in a cluster of computers. We show that, for ad hoc query processing and assuming a limited amount of resources available, these strategies still have room for improvement both in terms of I/O and inter-node data traffic communication. Our analysis concludes with the proposal of a hybrid solution that improves these two aspects compared to the previous techniques, and shows near optimal results in a broad spectrum of cases.  相似文献   

In this paper, a new approach has been introduced that integrates an evolutionary-based mechanism with a distributed query sensor cover algorithm for optimal query execution in self-organized wireless sensor networks (WSN). An algorithm based on an evolutionary technique is proposed, with problem-specific genetic operators to improve computing efficiency. Redundancy within a sensor network can be exploited to reduce the communication cost incurred in execution of spatial queries. Any reduction in communication cost would result in an efficient use of battery energy, which is very limited in sensors. Our objective is to self-organize the network, in response to a query, into a topology that involves an optimal subset of sensors that is sufficient to process the query subject to connectivity, coverage, energy consumption, cover size and communication overhead constraints. Query processing must incorporate energy awareness into the system by reducing the total energy consumption and hence increasing the lifetime of the sensor cover, which is beneficial for large long running queries. Experiments have been carried out on networks with different sensors Transmission radius, different query sizes, and different network configurations. Through extensive simulations, we have shown that our designed technique result in substantial energy savings in a sensor network. Compared with other techniques, the results demonstrated a significant improvement of the proposed technique in terms of energy-efficient query cover with lower communication cost and lower size.  相似文献   

A number of proposals for integrating geographical (Geographical Information Systems—GIS) and multidimensional (data warehouse—DW and online analytical processing—OLAP) processing are found in the database literature. However, most of the current approaches do not take into account the use of a GDW (geographical data warehouse) metamodel or query language to make available the simultaneous specification of multidimensional and spatial operators. To address this, this paper discusses the UML class diagram of a GDW metamodel and proposes its formal specifications. We then present a formal metamodel for a geographical data cube and propose the Geographical Multidimensional Query Language (GeoMDQL) as well. GeoMDQL is based on well-known standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL and has been specifically defined for spatial OLAP environments based on a GDW. We also present the GeoMDQL syntax and a discussion regarding the taxonomy of GeoMDQL query types. Additionally, aspects related to the GeoMDQL architecture implementation are described, along with a case study involving the Brazilian public healthcare system in order to illustrate the proposed query language.  相似文献   

随着时代的飞速发展,人们对智能生活的追求不断提高,空间查询也被人们愈来愈重视。移动空间关键字查询,作为一种主要的连续空间查询类型,受到了广泛的研究。在最新的顶尖会议文刊中,提出了一种新的查询类型,称为移动集合空间关键字查询(MCSKQ)。这种类型的查询不断报告一组对象,这些对象在查询移动时共同覆盖查询关键字。同时,返回的对象也必须靠近查询对象并且彼此靠近。计算精确的结果集是一个NP-hard的问题。为了降低查询处理的成本,本文提出了基于安全区域技术的算法,在查询对象移动时,保持精确的结果集。在其基础上,本文基于MCKSQ的思想提出新的优化策略,以降低查询处理成本的方法。  相似文献   

Modern applications requiring spatial network processing pose several interesting query optimization challenges. Spatial networks are usually represented as graphs, and therefore, queries involving a spatial network can be executed by using the corresponding graph representation. This means that the cost for executing a query is determined by graph properties such as the graph order and size (i.e., number of nodes and edges) and other graph parameters. In this paper, we present novel methods to estimate the number of nodes and edges in regions of interest in spatial networks, towards predicting the space and time requirements for range queries. The methods are evaluated by using real-life and synthetic data sets. Experimental results show that the number of nodes and edges can be estimated efficiently and accurately, with relatively small space requirements, thus providing useful information to the query optimizer.  相似文献   

Spatial database operations are typically performed in two steps. In the filtering step, indexes and the minimum bounding rectangles (MBRs) of the objects are used to quickly determine a set of candidate objects. In the refinement step, the actual geometries of the objects are retrieved and compared to the query geometry or each other. Because of the complexity of the computational geometry algorithms involved, the CPU cost of the refinement step is usually the dominant cost of the operation for complex geometries such as polygons. Although many run-time and pre-processing-based heuristics have been proposed to alleviate this problem, the CPU cost still remains the bottleneck. In this paper, we propose a novel approach to address this problem using the efficient rendering and searching capabilities of modern graphics hardware. This approach does not require expensive pre-processing of the data or changes to existing storage and index structures, and is applicable to both intersection and distance predicates. We evaluate this approach by comparing the performance with leading software solutions. The results show that by combining hardware and software methods, the overall computational cost can be reduced substantially for both spatial selections and joins. We integrated this hardware/software co-processing technique into a popular database to evaluate its performance in the presence of indexes, pre-processing and other proprietary optimizations. Extensive experimentation with real-world data sets show that the hardware-accelerated technique not only outperforms the run-time software solutions but also performs as well if not better than pre-processing-assisted techniques.  相似文献   

A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.  相似文献   

Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across a super-peer network. Our approach relies on SIMPEER (Doulkeridis et al. in Proceedings of VLDB, pp. 986–997, 2007), a framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of exact range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. In this paper, we extend SIMPEER by focusing on efficient range query processing and providing recall-based guarantees for the quality of the result retrieved so far. This is especially useful for range queries that lead to result sets of high cardinality and incur high processing costs, while the complete result set becomes overwhelming for the user. Our framework employs statistics for estimating an upper limit of the number of possible results for a range query and each super-peer may decide not to propagate further the query and reduce the scope of the search. We provide an experimental evaluation of our framework and show that our approach performs efficiently, even in the case of high degree of distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号