首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.  相似文献   

2.
We study the power of four query models in the context of property testing in general graphs, where our main case study is the problem of testing k-colorability. Two query types, which have been studied extensively in the past, are pair queries and neighbor queries. The former corresponds to asking whether there is an edge between any particular pair of vertices, and the latter to asking for the i th neighbor of a particular vertex. We show that while for pair queries testing k-colorability requires a number of queries that is a monotone decreasing function in the average degree d, the query complexity in the case of neighbor queries remains roughly the same for every density and for large values of k. We also consider a combined model that allows both types of queries, and we propose a new, stronger, query model, related to the field of Group Testing. We give upper and lower bounds on the query complexity for one-sided error in all the models, where the bounds are nearly tight for three of the models. In some of the cases, our lower bounds extend to two-sided error algorithms. The problem of testing k-colorability was previously studied in the contexts of dense graphs and of sparse graphs, and in our proofs we unify approaches from those cases, and also provide some new tools and techniques that may be of independent interest.  相似文献   

3.
Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.  相似文献   

4.
Given a set D of trajectories, a query object q, and a query time extent Γ, a mutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k1 nearest neighbors (NNs) of q within Γ, and meanwhile, have q as one of their k2 NNs. This type of queries is useful in many applications such as decision making, data mining, and pattern recognition, as it considers both the proximity of the trajectories to q and the proximity of q to the trajectories. In this paper, we first formalize MNN search and identify its characteristics, and then develop several algorithms for processing MNN queries efficiently. In particular, we investigate two classes of MNN queries, i.e., MNNP and MNNT queries, which are defined with respect to stationary query points and moving query trajectories, respectively. Our methods utilize the batch processing and reusing technology to reduce the I/O cost (i.e., number of node/page accesses) and CPU time significantly. In addition, we extend our techniques to tackle historical continuous MNN (HCMNN) search for moving object trajectories, which returns the mutual nearest neighbors of q (for a specified k1 and k2) at any time instance of Γ. Extensive experiments with real and synthetic datasets demonstrate the performance of our proposed algorithms in terms of efficiency and scalability.  相似文献   

5.
Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings.  相似文献   

6.
It is well known that, for fixedk, to find thek-th largest ofn elementsn+(k?1)log2 n+Θ(1) comparisons are necessary and sufficient. But do the same bounds apply if we use a different type of query? We show that the arity of the queries is relevant. In particular, we present upper and lower bounds for finding the maximum using 3-ary or 4-ary Boolean (YES/NO answers) queries. We also study general (e.g.,max, sort) 3-ary queries, and show bounds for finding the maximum and the second largest. For sort queries we show matching upper and lower bounds.  相似文献   

7.
A survey of queries over uncertain data   总被引:1,自引:1,他引:0  
Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top- $k$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.  相似文献   

8.
In this paper we propose a fundamental approach to perform the class of Range and Nearest Neighbor (NN) queries, the core class of spatial queries used in location-based services, without revealing any location information about the query in order to preserve users’ private location information. The idea behind our approach is to utilize the power of one-way transformations to map the space of all objects and queries to another space and resolve spatial queries blindly in the transformed space. Traditional encryption based techniques, solutions based on the theory of private information retrieval, or the recently proposed anonymity and cloaking based approaches cannot provide stringent privacy guarantees without incurring costly computation and/or communication overhead. In contrast, we propose efficient algorithms to evaluate KNN and range queries privately in the Hilbert transformed space. We also propose a dual curve query resolution technique which further reduces the costs of performing range and KNN queries using a single Hilbert curve. We experimentally evaluate the performance of our proposed range and KNN query processing techniques and verify the strong level of privacy achieved with acceptable computation and communication overhead.  相似文献   

9.
10.
Recommendation systems aim to recommend items or packages of items that are likely to be of interest to users. Previous work on recommendation systems has mostly focused on recommending points of interest (POI), to identify and suggest top-k items or packages that meet selection criteria and satisfy compatibility constraints on items in a package, where the (packages of) items are ranked by their usefulness to the users. As opposed to prior work, this paper investigates two issues beyond POI recommendation that are also important to recommendation systems. When there exist no sufficiently many POI that can be recommended, we propose (1) query relaxation recommendation to help users revise their selection criteria, or (2) adjustment recommendation to guide recommendation systems to modify their item collections, such that the users׳ requirements can be satisfied.We study two related problems, to decide (1) whether the query expressing the selection criteria can be relaxed to a limited extent, and (2) whether we can update a bounded number of items, such that the users can get desired recommendations. We establish the upper and lower bounds of these problems, all matching, for both combined and data complexity, when selection criteria and compatibility constraints are expressed in a variety of query languages, for both item recommendation and package recommendation. To understand where the complexity comes from, we also study the impact of variable sizes of packages, compatibility constraints and selection criteria on the analyses of these problems. Our results indicate that in most cases the complexity bounds of query relaxation and adjustment recommendation are comparable to their counterparts of the basic recommendation problem for testing whether a given set of (resp. packages of) items makes top-k items (resp. packages). In other words, extending recommendation systems with the query relaxation and adjustment recommendation functionalities typically does not incur extra overhead.  相似文献   

11.
We study the power of nonadaptive quantum query algorithms, which are algorithms whose queries to the input do not depend on the result of previous queries. First, we show that any bounded-error nonadaptive quantum query algorithm that computes a total boolean function depending on n variables must make Ω(n) queries to the input in total. Second, we show that, if there exists a quantum algorithm that uses k nonadaptive oracle queries to learn which one of a set of m boolean functions it has been given, there exists a nonadaptive classical algorithm using queries to solve the same problem. Thus, in the nonadaptive setting, quantum algorithms for these tasks can achieve at most a very limited speed-up over classical query algorithms.  相似文献   

12.
Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain data using three parameters: rank threshold k, probability threshold p, and answer set size threshold l. Systematically, we identify four types of ranking queries on uncertain data. First, a probability threshold top-k query computes the uncertain records taking a probability of at least p to be in the top-k list. Second, a top-(k, l) query returns the top-l uncertain records whose probabilities of being ranked among top-k are the largest. Third, the p-rank of an uncertain record is the smallest number k such that the record takes a probability of at least p to be ranked in the top-k list. A rank threshold top-k query retrieves the records whose p-ranks are at most k. Last, a top-(p, l) query returns the top-l uncertain records with the smallest p-ranks. To answer such ranking queries, we present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation-based algorithm. To answer top-(k, l) queries and top-(p, l) queries, we propose PRist+, a compact index. An efficient index construction algorithm and efficacious query answering methods are developed for PRist+. An empirical study using real and synthetic data sets verifies the effectiveness of the probabilistic ranking queries and the efficiency of our methods.  相似文献   

13.
Numerous frameworks have been proposed in recent years for deductive databases with uncertainty. On the basis of how uncertainty is associated with the facts and rules in a program, we classify these frameworks into implication-based (IB) and annotation-based (AB) frameworks. We take the IB approach and propose a generic framework, called the parametric framework, as a unifying umbrella for IB frameworks. We develop the declarative, fixpoint, and proof-theoretic semantics of programs in our framework and show their equivalence. Using the framework as a basis, we then study the query optimization problem of containment of conjunctive queries in this framework and establish necessary and sufficient conditions for containment for several classes of parametric conjunctive queries. Our results yield tools for use in the query optimization for large classes of query programs in IB deductive databases with uncertainty  相似文献   

14.
The recently introduced Datalog+?/?? family of ontology languages is especially useful for representing and reasoning over lightweight ontologies, and is set to play a central role in the context of query answering and information extraction for the Semantic Web. Recently, it has become apparent that it is necessary to develop a principled way to handle uncertainty in this domain. In addition to uncertainty as an inherent aspect of the Web, one must also deal with forms of uncertainty due to inconsistency and incompleteness, uncertainty resulting from automatically processing Web data, as well as uncertainty stemming from the integration of multiple heterogeneous data sources. In this paper, we take an important step in this direction by developing a probabilistic extension of Datalog+?/??. This extension uses Markov logic networks as the underlying probabilistic semantics. Here, we focus especially on scalable algorithms for answering threshold queries, which correspond to the question “what is the set of all ground atoms that are inferred from a given probabilistic ontology with a probability of at least p?”. These queries are especially relevant to Web information extraction, since uncertain rules lead to uncertain facts, and only information with a certain minimum confidence is desired. We present several algorithms, namely a basic approach, an anytime one, and one based on heuristics, which is guaranteed to return sound results. Furthermore, we also study inconsistency in probabilistic Datalog+?/?? ontologies. We propose two approaches for computing preferred repairs based on two different notions of distance between repairs, namely symmetric and score-based distance. We also study the complexity of the decision problems corresponding to computing such repairs, which turn out to be polynomial and NP-complete in the data complexity, respectively.  相似文献   

15.
A Business Process (BP for short) consists of a set of activities which, combined in a flow, achieve some business goal. A given BP may have a large, possibly infinite, number of possible execution flows (EX-flows for short), each having some probability to occur at run time. This paper studies query evaluation over such probabilistic BPs. We focus on two important classes of queries, namely boolean queries that compute the probability that a random EX-flow of a BP satisfies a given property, and projection queries focusing on portions of EX-flows that are of interest to the user. For the latter queries the answer consists of the top-k instances of these portions that are most likely to occur at run-time. We study the complexity of query evaluation for both kinds of queries, showing in particular that projection queries may be harder to evaluate than boolean queries. We present a picture of which combinations of BP classes and query features lead to PTIME algorithms and which to NP-hard or infeasible problems.  相似文献   

16.
In complex search tasks, it is often required to pose several basic search queries, join the answers to these queries, where each answer is given as a ranked list of items, and return a ranked list of combinations. However, the join result may include too many repetitions of items, and hence, frequently the entire join is too large to be useful. This can be solved by choosing a small subset of the join result. The focus of this paper is on how to choose this subset. We propose two measures for estimating the quality of result sets, namely, coverage and optimality ratio. Intuitively, maximizing the coverage aims at including in the result as many as possible appearances of items in their optimal combination, and maximizing the optimality ratio means striving to have each item appearing only in its optimal combination, i.e., only in the most highly ranked combination that contains it. One of the difficulties, when choosing the subset of the join in a complex search, is that there is a conflict between maximizing the coverage and maximizing the optimality ratio. In this paper, we introduce the measures coverage and optimality ratio. We present new semantics for complex search queries, aiming at providing high coverage and high optimality ratio. We examine the quality of the results of existing and the novel semantics, according to these two measures, and we provide algorithms for answering complex search queries under the new semantics. Finally, we present an experimental study, using Yahoo! Local Search Web Services, of the efficiency and the scalability of our algorithms, showing that complex search queries can be evaluated effectively under the proposed semantics.  相似文献   

17.
This article addresses the problem of performing Nearest Neighbor (NN) queries on uncertain trajectories. The answer to an NN query for certain trajectories is time parameterized due to the continuous nature of the motion. As a consequence of uncertainty, there may be several objects that have a non-zero probability of being a nearest neighbor to a given querying object, and the continuous nature further complicates the semantics of the answer. We capture the impact that the uncertainty of the trajectories has on the semantics of the answer to continuous NN queries and we propose a tree structure for representing the answers, along with efficient algorithms to compute them. We also address the issue of performing NN queries when the motion of the objects is restricted to road networks. Finally, we formally define and show how to efficiently execute several variants of continuous NN queries. Our experiments demonstrate that the proposed algorithms yield significant performance improvements when compared with the corresponding naïve approaches.  相似文献   

18.
In this paper, we propose an efficient solution for processing continuous range spatial keyword queries over moving spatio-textual objects (namely, CRSK-mo queries). Major challenges in efficient processing of CRSK-mo queries are as follows: (i) the query range is determined based on both spatial proximity and textual similarity; thus a straightforward spatial proximity based pruning of the search space is not applicable as any object far from a query location with a high textual similarity score can still be the answer (and vice versa), (ii) frequent location updates may invalidate a query result, and thus require frequent re-computing of the result set for any object updates. To address these challenges, the key idea of our approach is to exploit the spatial and textual upper bounds between queries and objects to form safe zones (at the client-side) and buffer regions (at the server-side), and then use these bounds to quickly prune objects and queries through smart in-memory data structures. We conduct extensive experiments with a synthetic dataset that verify the effectiveness and efficiency of our proposed algorithm.  相似文献   

19.
Top-k query in a wireless sensor network is to find the k sensor nodes with the highest sensing values. To evaluate the top-k query in such an energy-constrained network poses great challenges, due to the unique characteristics imposed on its sensors. Existing solutions for top-k query in the literature mainly focused on energy efficiency but little attention has been paid to the query response time and its effect on the network lifetime. In this paper we address the query response time and its effect on the network lifetime through the study of the top-k query problem in sensor networks with the response time constraint. We aim at finding an energy-efficient routing tree and evaluating top-k queries on the tree such that the network lifetime is significantly prolonged, provided that the query response time constraint is met too. To do so, we first present a cost model of energy consumption for answering top-k queries and introduce the query response time definition. We then propose a novel joint query optimization framework, which consists of finding a routing tree in the network and devising a filter-based evaluation algorithm for top-k query evaluation on the tree. We finally conduct extensive experiments by simulation to evaluate the performance of the proposed algorithms, in terms of the total energy consumption, the maximum energy consumption among nodes, the query response time, and the network lifetime. The experimental results showed that there is a non-trivial tradeoff between the query response time and the network lifetime, and the joint query optimization framework can prolong the network lifetime significantly under a specified query response time constraint.  相似文献   

20.
Abstract. In meta-searchers accessing distributed Web-based information repositories, performance is a major issue. Efficient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve data. In this work, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctive Web queries based on signature files. Our algorithms cope with both relations of semantic containment and intersection between a query and the corresponding cache items. We also develop the cache replacement strategy to treat situations when cached items differ in size and contribution when providing partial query answers. We report results of experiments and show how the caching mechanism is realized in the Knowledge Broker system. Received June 15, 1999 / Accepted December 24, 1999  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号