首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose an efficient solution for processing continuous range spatial keyword queries over moving spatio-textual objects (namely, CRSK-mo queries). Major challenges in efficient processing of CRSK-mo queries are as follows: (i) the query range is determined based on both spatial proximity and textual similarity; thus a straightforward spatial proximity based pruning of the search space is not applicable as any object far from a query location with a high textual similarity score can still be the answer (and vice versa), (ii) frequent location updates may invalidate a query result, and thus require frequent re-computing of the result set for any object updates. To address these challenges, the key idea of our approach is to exploit the spatial and textual upper bounds between queries and objects to form safe zones (at the client-side) and buffer regions (at the server-side), and then use these bounds to quickly prune objects and queries through smart in-memory data structures. We conduct extensive experiments with a synthetic dataset that verify the effectiveness and efficiency of our proposed algorithm.  相似文献   

2.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method.  相似文献   

3.
4.
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.  相似文献   

5.
In this paper, we focus on efficient processing of XML keyword queries based on smallest lowest common ancestor (SLCA) semantics. For a given query Q with m Key words we propose to use stable matches as the basis for SLCA computation, where each stable match M consists of m nodes that belong to the m distinct keyword inverted lists of Q. M satisfies that no other lowest common ancestor (LCA) node of Q can be found to be located after the first node of M and be a descendant of the LCA of M, based on which the operation of locating a stable match can skip more useless nodes. We propose two stable match based algorithms for SLCA computation, i.e., BSLCA and HSLCA. BSLCA processes two keyword inverted lists each time from the shortest to the longest, while HSLCA processes all keyword inverted lists in a holistic way to avoid the problem of redundant computation invoked by BSLCA. Our extensive experimental results verify the performance advantages of our methods according to various evaluation metrics.  相似文献   

6.
Sensor networks are often used to perform monitoring tasks, such as animal and vehicle tracking, or the surveillance of enemy forces in military applications. In this paper we introduce the concept of proximity queries, which allow us to report interesting events, observed by nodes in the network that lie within a certain distance from each other. An event is triggered when a user-programmable predicate is satisfied on a sensor node. We study the problem of computing proximity queries in sensor networks and propose several alternative techniques that differ in the number of messages exchanged by the nodes and the quality of the returned answers. Our solutions utilize a distributed routing index, maintained by the nodes in the network, that is dynamically updated as new observations are obtained by the nodes. This distributed index allows us to efficiently process multiple proximity queries involving several different event types within a fraction of the cost that a straightforward evaluation requires. We present an extensive experimental study to show the benefits of our techniques under different scenarios using both synthetic and real data sets. Our results demonstrate that our algorithms scale better and require significantly fewer messages compared to a straightforward execution of the queries.  相似文献   

7.
Keyword search can provide users an easy method to query large and complex databases without any knowledge of structured query languages or underlying database schema. Most of the existing studies have focused on generating candidate structured queries relevant to keywords. Due to the large size of generated queries, the execution costs may be prohibitive. However, existing studies lack the idea of a generalized method to optimize the plan of the large set of generated queries. In this paper, we introduce a graph-theoretic optimization approach. We propose a general graph model, Weighted Operator Graph, to address the costs of keyword query evaluation plans. The proposed model is flexible to integrate all of the cost-based plans in a uniform way. We define a Keyword Query Optimization Problem based on a theoretical cost model as a graph-theoretic problem and show it to be a NP-hard problem. We propose a greedy heuristic Maximum Propagation that reduces the size of the intermediate result as early as possible. The proposed algorithm allows us to achieve efficiency in terms of query evaluation costs. The experimental studies on both synthetic and real data set results show that our work outperforms the existing work.  相似文献   

8.
A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.  相似文献   

9.
Keyword proximity search in XML trees   总被引:3,自引:0,他引:3  
Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeare's plays in XML, the user might be interested in knowing how the keywords cooccur. In this paper, we focus on XML trees and define XML keyword, proximity queries to return the (possibly heterogeneous) set of minimum connecting trees (MCTs) of the matches to the individual keywords in the query. We consider efficiently executing keyword proximity queries on labeled trees (XML) in various settings: 1) when the XML database has been preprocessed and 2) when no indices are available on the XML database. We perform a detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches.  相似文献   

10.
Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches.  相似文献   

11.
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency.  相似文献   

12.
Learning conditional preference networks   总被引:2,自引:0,他引:2  
  相似文献   

13.
Providing built-in keyword search capabilities in RDBMS   总被引:2,自引:0,他引:2  
A common approach to performing keyword search over relational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query-processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches.  相似文献   

14.
The representation of large subsets of the World Wide Web in the form of a directed graph has been extensively used to analyze structure, behavior, and evolution of those so-called Web graphs. However, interesting Web graphs are very large and their classical representations do not fit into the main memory of typical computers, whereas the required graph algorithms perform inefficiently on secondary memory. Compressed graph representations drastically reduce their space requirements while allowing their efficient navigation in compressed form. While the most basic navigation operation is to retrieve the successors of a node, several important Web graph algorithms require support for extended queries, such as finding the predecessors of a node, checking the presence of a link, or retrieving links between ranges of nodes. Those are seldom supported by compressed graph representations.This paper presents the k2-tree, a novel Web graph representation based on a compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph. The representation not only retrieves successors and predecessors in symmetric fashion, but also it is particularly efficient to check for specific links between nodes, or between ranges of nodes, or to list the links between ranges. Compared to the best representations in the literature supporting successor and predecessor queries, our technique offers the least space usage (1–3 bits per link) while supporting fast navigation to predecessors and successors (28μs per neighbor retrieved) and sharply outperforming the others on the extended queries. The representation is also of general interest and can be used to compress other kinds of graphs and data structures.  相似文献   

15.
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

16.
Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results.In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithms scale smoothly when the size of the dataset and the number of keywords increase.  相似文献   

17.
In this paper we propose a fundamental approach to perform the class of Range and Nearest Neighbor (NN) queries, the core class of spatial queries used in location-based services, without revealing any location information about the query in order to preserve users’ private location information. The idea behind our approach is to utilize the power of one-way transformations to map the space of all objects and queries to another space and resolve spatial queries blindly in the transformed space. Traditional encryption based techniques, solutions based on the theory of private information retrieval, or the recently proposed anonymity and cloaking based approaches cannot provide stringent privacy guarantees without incurring costly computation and/or communication overhead. In contrast, we propose efficient algorithms to evaluate KNN and range queries privately in the Hilbert transformed space. We also propose a dual curve query resolution technique which further reduces the costs of performing range and KNN queries using a single Hilbert curve. We experimentally evaluate the performance of our proposed range and KNN query processing techniques and verify the strong level of privacy achieved with acceptable computation and communication overhead.  相似文献   

18.
Path queries have been extensively used to query semistructured data, such as the Web and XML documents. In this paper we introduce weighted path queries, an extension of path queries enabling several classes of optimization problems (such as the computation of shortest paths) to be easily expressed. Weighted path queries are based on the notion of weighted regular expression, i.e., a regular expression whose symbols are associated to a weight. We characterize the problem of answering weighted path queries and provide an algorithm for computing their answer. We also show how weighted path queries can be effectively embedded into query languages for XML data to express in a simple and compact form several meaningful research problems.  相似文献   

19.
A visible k nearest neighbor (Vk NN) query retrieves k objects that are visible and nearest to the query object, where “visible” means that there is no obstacle between an object and the query object. Existing studies on the Vk NN query have focused on static data objects. In this paper we investigate how to process the query on moving objects continuously. We propose an effective filtering-and-refinement framework for evaluating this type of queries. We exploit spatial proximity and visibility properties between the query object and data objects to prune search space under this framework. A detailed cost analysis and a comprehensive experimental study are conducted on the proposed framework. The results validate the effectiveness of the pruning techniques and verify the efficiency of the proposed framework. The proposed framework outperforms a straightforward solution by an order of magnitude in terms of both communication and computation costs.  相似文献   

20.
Traditional spatial queries return, for a given query object q, all database objects that satisfy a given predicate, such as epsilon range and k-nearest neighbors. This paper defines and studies inverse spatial queries, which, given a subset of database objects Q and a query predicate, return all objects which, if used as query objects with the predicate, contain Q in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse k-nearest neighbor queries, and inverse skyline queries. Furthermore, we show how to relax the definition of inverse queries in order to ensure non-empty result sets. Our experiments show that our framework is significantly more efficient than naive approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号