共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method. 相似文献
2.
Many applications often require finding sets of entities of interest that meet certain constraints. Such set-based queries (SQs) can be broadly classified into two types: optimization SQs that involve some optimization constraint and enumerative SQs that do not have any optimization constraint. While there has been much research on the evaluation of optimization SQs, there is very little work on the evaluation of enumerative SQs, which represent the most fundamental fragment of set-based queries. In this paper, we address the problem of evaluating enumerative SQs using RDBMS. While enumerative SQs can be expressed using SQL, existing relational engines, unfortunately, are not able to efficiently evaluate such queries due to their complexity. In this paper, we propose a novel evaluation approach for enumerative SQs. Our experimental results on PostgreSQL demonstrate that our proposed approach outperforms the conventional approach by up to three orders of magnitude. 相似文献
3.
A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach. 相似文献
4.
LING TokWang 《中国科学F辑(英文版)》2009,52(10):1830-1847
As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this featu... 相似文献
5.
Dongjoon Hyun No Joon Park Jin Hyun Son Myoung Ho Kim 《Distributed and Parallel Databases》2006,20(3):171-197
Continuous aggregation queries with a tolerable error threshold have many applications in sensor networks. Since the communication
cost is important in the lifetime of sensor networks, there have been a few methods to reduce the communication cost for continuous
aggregation queries having a tolerable error threshold. In previous methods, the error threshold in each node is periodically
adjusted based on the global statistics collected in the central site that are obtained from all the nodes in the network.
These methods require that users specify a few parameters, e.g., adjustment period. However, determination of these parameters
by users, in practice, is very difficult and undesirable for sensor network applications demanding unattended operations in
dynamically changing environments. In this paper, we propose a new in-network data aggregation protocol, called the Distributed
Adaptive Filtering (DAF) protocol. It works in a distributed manner and proceeds adaptively in the sense that the filtering
condition in each node is adaptively changed by using only local information. It does not require user parameters that are
used in the previous method. We show through various experiments that the proposed method outperforms other existing methods.
Recommended by: Ahmed Elmagarmid 相似文献
6.
Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods. 相似文献
7.
Hakan Ferhatosmanoglu Ali Şaman Tosun Guadalupe Canahuate Aravind Ramachandran 《Distributed and Parallel Databases》2006,20(2):117-147
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique
involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus
on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal
declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks.
It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate
replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication
is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data.
We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide
extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides
an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal
processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present
experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries,
on synthetic and real data sets.
Recommended by: Ahmed Elmagarmid
Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342. 相似文献
8.
Jing Yuan Guangzhong Sun Tao Luo Defu Lian Guoliang Chen 《Journal of Intelligent Information Systems》2012,39(3):687-710
Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et?al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally. 相似文献
9.
Dina Thomas Supratik Chakraborty Paritosh Pandya 《International Journal on Software Tools for Technology Transfer (STTT)》2008,10(2):113-129
Asynchronous systems consist of a set of transitions which are non-deterministically chosen and executed. We present a theory of guiding symbolic reachability in such systems by scheduling clusters of transitions. A theory of reachability expressions which specify the schedules is presented. This theory allows proving equivalence of different schedules which may have radically different performance in BDD-based search. We present experimental evidence to show that optimized reachability expressions give rise to significant performance advantages. The profiling is carried out in the NuSMV framework using examples from discrete timed automata and circuits with delays. A variant tool called NuSMVDP has been developed for interpreting reachability expressions to carry out the experiments. 相似文献
10.
Qi Yang Weining Zhang Chengwen Liu Jing Wu Yu C. Nakajima H. Rishe N.D. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(6):884-901
In a fuzzy relational database where a relation is a fuzzy set of tuples and ill-known data are represented by possibility distributions, nested fuzzy queries can be expressed in the Fuzzy SQL language. Although it provides a very convenient way for users to express complex queries, a nested fuzzy query may be very inefficient to process with the naive evaluation method based on its semantics. In conventional databases, nested queries are unnested to improve the efficiency of their evaluation. In this paper, we extend the unnesting techniques to process several types of nested fuzzy queries. An extended merge-join is used to evaluate the unnested fuzzy queries. As shown by both theoretical analysis and experimental results, the unnesting techniques with the extended merge-join significantly improve the performance of evaluating nested fuzzy queries 相似文献
11.
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graph-structured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient shortest path query processing on an EGS and put forward a solution framework called FVF. Two algorithms, namely, FVF-F and FVF-H, are proposed. While the FVF-F algorithm works on a sequence of flat graph clusters, the FVF-H algorithm works on a hierarchy of such clusters. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in shortest query processing on EGSs. Comparing FVF-F and FVF-H, the latter gives a larger speedup, is more flexible in terms of memory requirements, and is far less sensitive to parameter values. 相似文献
12.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms. 相似文献
13.
Xiang Zhao Chuan Xiao Xuemin Lin Wei Wang Yoshiharu Ishikawa 《The VLDB Journal The International Journal on Very Large Data Bases》2013,22(6):727-752
Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $q$ -gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets. 相似文献
14.
15.
HweeHwa Pang Xuhua Ding Baihua Zheng 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(3):437-456
The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores
over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed
with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes
in two phases—find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear scoring functions over disk-resident sorted lists. We introduce a model for estimating
the depths to which each sorted list needs to be processed in the two phases, so that (most of) the required records can be
fetched efficiently through sequential or batched I/Os. We also devise a mechanism to quickly rank the data that qualify for
the query answer and to eliminate those that do not, in order to reduce the computation demand of the query processor. Extensive
experiments with four different datasets confirm that our schemes achieve substantial performance speed-up of between two
times and two orders of magnitude over existing TAs, at the expense of a memory overhead of 4.8 bits per attribute value.
Moreover, our scheme is robust to different data distributions and query characteristics. 相似文献
16.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches. 相似文献
17.
Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Yinghui Wu 《Frontiers of Computer Science》2012,6(3):313-338
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data. 相似文献
18.
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data 总被引:2,自引:0,他引:2
Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):787-808
Reverse nearest neighbor (RNN) search is very crucial in many real applications. In particular, given a database and a query
object, an RNN query retrieves all the data objects in the database that have the query object as their nearest neighbors.
Often, due to limitation of measurement devices, environmental disturbance, or characteristics of applications (for example,
monitoring moving objects), data obtained from the real world are uncertain (imprecise). Therefore, previous approaches proposed
for answering an RNN query over exact (precise) database cannot be directly applied to the uncertain scenario. In this paper,
we re-define the RNN query in the context of uncertain databases, namely probabilistic reverse nearest neighbor (PRNN) query,
which obtains data objects with probabilities of being RNNs greater than or equal to a user-specified threshold. Since the
retrieval of a PRNN query requires accessing all the objects in the database, which is quite costly, we also propose an effective
pruning method, called geometric pruning (GP), that significantly reduces the PRNN search space yet without introducing any
false dismissals. Furthermore, we present an efficient PRNN query procedure that seamlessly integrates our pruning method.
Extensive experiments have demonstrated the efficiency and effectiveness of our proposed GP-based PRNN query processing approach,
under various experimental settings. 相似文献
19.
20.
k步可达查询用于在给定的有向无环图(DAG)中回答两点之间是否存在长度不超过k的路径。针对现有方法的索引规模大、查询处理效率低的问题,提出一种基于部分点的双向最短路径索引来提升索引的可达信息覆盖率,并提出一组优化规则来减小索引规模;然后提出基于简化图的正反互逆拓扑索引来加速回答不可达查询;最后提出远距离优先的双向遍历策略来提高查询处理的效率。基于21个真实数据集(如引用网络、社交网络等)的实验结果表明,相比已有的高效方法PLL及BFSI-B,所提出的算法具有更小的索引规模和更快的查询响应速度。 相似文献