首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method.  相似文献   

2.
Many applications often require finding sets of entities of interest that meet certain constraints. Such set-based queries (SQs) can be broadly classified into two types: optimization SQs that involve some optimization constraint and enumerative SQs that do not have any optimization constraint. While there has been much research on the evaluation of optimization SQs, there is very little work on the evaluation of enumerative SQs, which represent the most fundamental fragment of set-based queries. In this paper, we address the problem of evaluating enumerative SQs using RDBMS. While enumerative SQs can be expressed using SQL, existing relational engines, unfortunately, are not able to efficiently evaluate such queries due to their complexity. In this paper, we propose a novel evaluation approach for enumerative SQs. Our experimental results on PostgreSQL demonstrate that our proposed approach outperforms the conventional approach by up to three orders of magnitude.  相似文献   

3.
A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach.  相似文献   

4.
As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this featu...  相似文献   

5.
Continuous aggregation queries with a tolerable error threshold have many applications in sensor networks. Since the communication cost is important in the lifetime of sensor networks, there have been a few methods to reduce the communication cost for continuous aggregation queries having a tolerable error threshold. In previous methods, the error threshold in each node is periodically adjusted based on the global statistics collected in the central site that are obtained from all the nodes in the network. These methods require that users specify a few parameters, e.g., adjustment period. However, determination of these parameters by users, in practice, is very difficult and undesirable for sensor network applications demanding unattended operations in dynamically changing environments. In this paper, we propose a new in-network data aggregation protocol, called the Distributed Adaptive Filtering (DAF) protocol. It works in a distributed manner and proceeds adaptively in the sense that the filtering condition in each node is adaptively changed by using only local information. It does not require user parameters that are used in the previous method. We show through various experiments that the proposed method outperforms other existing methods. Recommended by: Ahmed Elmagarmid  相似文献   

6.
Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods.  相似文献   

7.
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets. Recommended by: Ahmed Elmagarmid Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342.  相似文献   

8.
Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et?al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally.  相似文献   

9.
Asynchronous systems consist of a set of transitions which are non-deterministically chosen and executed. We present a theory of guiding symbolic reachability in such systems by scheduling clusters of transitions. A theory of reachability expressions which specify the schedules is presented. This theory allows proving equivalence of different schedules which may have radically different performance in BDD-based search. We present experimental evidence to show that optimized reachability expressions give rise to significant performance advantages. The profiling is carried out in the NuSMV framework using examples from discrete timed automata and circuits with delays. A variant tool called NuSMVDP has been developed for interpreting reachability expressions to carry out the experiments.  相似文献   

10.
In a fuzzy relational database where a relation is a fuzzy set of tuples and ill-known data are represented by possibility distributions, nested fuzzy queries can be expressed in the Fuzzy SQL language. Although it provides a very convenient way for users to express complex queries, a nested fuzzy query may be very inefficient to process with the naive evaluation method based on its semantics. In conventional databases, nested queries are unnested to improve the efficiency of their evaluation. In this paper, we extend the unnesting techniques to process several types of nested fuzzy queries. An extended merge-join is used to evaluate the unnested fuzzy queries. As shown by both theoretical analysis and experimental results, the unnesting techniques with the extended merge-join significantly improve the performance of evaluating nested fuzzy queries  相似文献   

11.
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graph-structured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient shortest path query processing on an EGS and put forward a solution framework called FVF. Two algorithms, namely, FVF-F and FVF-H, are proposed. While the FVF-F algorithm works on a sequence of flat graph clusters, the FVF-H algorithm works on a hierarchy of such clusters. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in shortest query processing on EGSs. Comparing FVF-F and FVF-H, the latter gives a larger speedup, is more flexible in terms of memory requirements, and is far less sensitive to parameter values.  相似文献   

12.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.  相似文献   

13.
Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $q$ -gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets.  相似文献   

14.
15.
The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes in two phases—find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear scoring functions over disk-resident sorted lists. We introduce a model for estimating the depths to which each sorted list needs to be processed in the two phases, so that (most of) the required records can be fetched efficiently through sequential or batched I/Os. We also devise a mechanism to quickly rank the data that qualify for the query answer and to eliminate those that do not, in order to reduce the computation demand of the query processor. Extensive experiments with four different datasets confirm that our schemes achieve substantial performance speed-up of between two times and two orders of magnitude over existing TAs, at the expense of a memory overhead of 4.8 bits per attribute value. Moreover, our scheme is robust to different data distributions and query characteristics.  相似文献   

16.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.  相似文献   

17.
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.  相似文献   

18.
Reverse nearest neighbor (RNN) search is very crucial in many real applications. In particular, given a database and a query object, an RNN query retrieves all the data objects in the database that have the query object as their nearest neighbors. Often, due to limitation of measurement devices, environmental disturbance, or characteristics of applications (for example, monitoring moving objects), data obtained from the real world are uncertain (imprecise). Therefore, previous approaches proposed for answering an RNN query over exact (precise) database cannot be directly applied to the uncertain scenario. In this paper, we re-define the RNN query in the context of uncertain databases, namely probabilistic reverse nearest neighbor (PRNN) query, which obtains data objects with probabilities of being RNNs greater than or equal to a user-specified threshold. Since the retrieval of a PRNN query requires accessing all the objects in the database, which is quite costly, we also propose an effective pruning method, called geometric pruning (GP), that significantly reduces the PRNN search space yet without introducing any false dismissals. Furthermore, we present an efficient PRNN query procedure that seamlessly integrates our pruning method. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed GP-based PRNN query processing approach, under various experimental settings.  相似文献   

19.
针对现实中许多超大规模图可达性查询的问题,提出了一种新的基于递归分解的算法,即将原图递归分解成一系列生成树和剩余图两类子图,并通过分别查询这两类子图来减少查询开销.相比于区间标记、链分解、2-hop标签和路径树等传统算法,该算法不仅空间开销更小,且时间复杂度更低.仿真实验表明,该算法对处理大规模有向图可达性问题上存储规模更小且查询效率更高.  相似文献   

20.
k步可达查询用于在给定的有向无环图(DAG)中回答两点之间是否存在长度不超过k的路径。针对现有方法的索引规模大、查询处理效率低的问题,提出一种基于部分点的双向最短路径索引来提升索引的可达信息覆盖率,并提出一组优化规则来减小索引规模;然后提出基于简化图的正反互逆拓扑索引来加速回答不可达查询;最后提出远距离优先的双向遍历策略来提高查询处理的效率。基于21个真实数据集(如引用网络、社交网络等)的实验结果表明,相比已有的高效方法PLL及BFSI-B,所提出的算法具有更小的索引规模和更快的查询响应速度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号