共查询到20条相似文献,搜索用时 31 毫秒
1.
Christopher Ré Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1091-1116
We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to
HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic
databases where the aggregation function is one of MIN, MAX, EXISTS, COUNT, SUM, AVG, or COUNT(DISTINCT) and the comparison function is one of =, ≠,≥,>,≤, or <. The complexity of evaluating a HAVING query depends on the aggregation function, α, and the comparison function, θ. In this paper, we establish a set of trichotomy results for conjunctive queries with HAVING predicates parametrized by (α, θ). For such queries (without self-joins), one of the following three statements is true: (1) the exact evaluation problem
has P{\mathcal P} -time data complexity. In this case, we call the query safe. (2) The exact evaluation problem is
\sharpP{{\sharp{\mathcal P}}} -hard, but the approximate evaluation problem has (randomized) P{{\mathcal P}} -time data complexity. More precisely, there exists an FPTRAS for the query. In this case, we call the query apx-safe. (3) The exact evaluation problem is
\sharpP{{\sharp{\mathcal P}}} -hard, and the approximate evaluation problem is also hard. We call these queries hazardous. The precise definition of each class depends on the aggregate considered and the comparison function. Thus, we have queries
that are (MAX,≥ )-safe, (COUNT,≤ )-apx-safe, (SUM,=)-hazardous, etc. Our trichotomy result is a significant extension of a previous dichotomy result for Boolean conjunctive
queries into safe and not safe. For each of the three classes we present novel techniques. For safe queries, we describe an
evaluation algorithm that uses random variables over semirings. For apx-safe queries, we describe an FPTRAS that relies on
a novel algorithm for generating a random possible world satisfying a given condition. Finally, for hazardous queries we give
novel proofs of hardness of approximation. The results for safe queries were previously announced (in Ré, C., Suciu, D. Efficient
evaluation of. In: DBPL, pp. 186–200, 2007), but all other results are new. 相似文献
2.
Jorge-Arnulfo Quiané-Ruiz Philippe Lamarre Patrick Valduriez 《Distributed and Parallel Databases》2012,30(1):1-26
In large-scale Internet-based distributed systems, participants (consumers and providers) are typically autonomous, i.e. they
may have special interests towards queries and other participants. In this context, a way to avoid a participant to voluntarily
leave the system is satisfying its interests when allocating queries. However, participants satisfaction may also be negatively
affected by the failures of other participants. Query replication is a solution to deal with providers failures, but, it is
challenging because of autonomy: it cannot only quickly overload the system, but also it can dissatisfy participants with
uninteresting queries. Thus, a natural question arises: should queries be replicated? If so, which ones? and how many times? In this paper, we answer these questions by revisiting query replication from a satisfaction and probabilistic point of
view. We propose a new algorithm, called S
b
QR, that decides on-the-fly whether a query should be replicated and at which rate. As replicating a large number of queries
might overload the system, we propose a variant of our algorithm, called S
b
QR+. The idea is to voluntarily fail to allocate as many replicas as required by consumers for low critical queries so as to
keep resources for high critical queries during query-intensive periods. Our experimental results demonstrate that our algorithms
significantly outperform the baseline algorithms from both the performance and satisfaction points of view. We also show that
our algorithms automatically adapt to the criticality of queries and different rates of participant failures. 相似文献
3.
In this paper, we propose CYBER, a CommunitY
Based sEaRch engine, for information retrieval utilizing community feedback information in a DHT network. In CYBER, each user is associated
with a set of user profiles that capture his/her interests. Likewise, a document is associated with a set of profiles—one
for each indexed term. A document profile is updated by users who query on the term and consider the document as a relevant
answer. Thus, the profile acts as a consolidation of users feedback from the same community, and reflects their interests.
In this way, as one user finds a document to be relevant, another user in the same community issuing a similar query will
benefit from the feedback provided by the earlier user. Hence, the search quality in terms of both precision and recall is
improved. Moreover, we further improve the effectiveness of CYBER by introducing an index tuning technique. By choosing the
indexing terms more carefully, community-based relevance feedback is utilized in both building/refining indices and re-evaluating
queries. We first propose a naive scheme, CYBER+, which involves an index tuning technique based on past queries only, and
then re-evaluates queries in a separate step. We then propose a more complex scheme, CYBER+ +, which refines its index based
on both past queries and relevance feedback. As the index is built with more selective and accurate terms, the search performance
is further improved. We conduct a comprehensive experimental study and the results show the effectiveness of our schemes. 相似文献
4.
Wanyu CHEN Fei CAI Honghui CHEN Maarten DE RIJKE 《Frontiers of Computer Science》2020,14(3):143602-141
Query suggestions help users refine their queries after they input an initial query.Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based,developing models that either focus on adapting to a specific user(personalization)or on diversifying query aspects in order to maximize the probability of the user being satisfied(diversification).We consider the task of generating query suggestions that are both personalized and diversified.We propose a personalized query suggestion diversification(PQSD)model,where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user's search context in their current session.Query aspects are identified through clicked documents based on the open directory project(ODP)with a latent dirichlet allocation(LDA)topic model.We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online(AOL)query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification.The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries,especially when more query suggestions are returned in the list. 相似文献
5.
An important query for spatio-temporal databases is to find nearest trajectories of moving objects. Existing work on this
topic focuses on the closest trajectories in the whole data space. In this paper, we introduce and solve constrained k-nearest neighbor (CkNN) queries and historical continuous CkNN (HCCkNN) queries on R-tree-like structures storing historical information about moving object trajectories. Given a trajectory
set D, a query object (point or trajectory) q, a temporal extent T, and a constrained region CR, (i) a CkNN query over trajectories retrieves from D within T, the k (≥ 1) trajectories that lie closest to q and intersect (or are enclosed by) CR; and (ii) an HCCkNN query on trajectories retrieves the constrained k nearest neighbors (CkNNs) of q at any time instance of T. We propose a suite of algorithms for processing CkNN queries and HCCkNN queries respectively, with different properties and advantages. In particular, we thoroughly investigate two types of CkNN queries, i.e., CkNNP and CkNNT, which are defined with respect to stationary query points and moving query trajectories, respectively; and two types of
HCCkNN queries, namely, HCCkNNP and HCCkNNT, which are continuous counterparts of CkNNP and CkNNT, respectively. Our methods utilize an existing data-partitioning index for trajectory data (i.e., TB-tree) to achieve low
I/O and CPU cost. Extensive experiments with both real and synthetic datasets demonstrate the performance of the proposed
algorithms in terms of efficiency and scalability. 相似文献
6.
XMin: Minimizing Tree Pattern Queries with Minimality Guarantee 总被引:1,自引:0,他引:1
Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In
particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly
depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent
minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while
the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally
known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a
top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query
by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the
original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the
linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML
data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up
fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally
prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational
DBMS. The experimental results using real and synthetic data sets show the practicality of our method. 相似文献
7.
Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(6):819-840
Query processing in the uncertain database has played an important role in many real-world applications due to the wide existence
of uncertain data. Although many previous techniques can correctly handle precise data, they are not directly applicable to
the uncertain scenario. In this article, we investigate and propose a novel query, namely probabilistic top-k star (PTkS) query, which aims to retrieve k objects in an uncertain database that are “closest” to a static/dynamic query point, considering both distance and probability
aspects. In order to efficiently answer PTkS queries with a static/moving query point, we propose effective pruning methods to reduce the PTkS search space, which can be seamlessly integrated into an efficient query procedure. Finally, extensive experiments have
demonstrated the efficiency and effectiveness of our proposed PTkS approaches on both real and synthetic data sets, under various parameter settings. 相似文献
8.
Jing Yang Gabriel Pui Cheong Fung Wei Lu Xiaofang Zhou Hong Chen Xiaoyong Du 《World Wide Web》2012,15(1):33-60
In a typical Web recommendation system, objects are often described by many attributes. It also needs to serve many users
with a diversified range of preferences. In other words, it must be capable to efficiently support high dimensional preference
queries that allow the user to explore the data space effectively without imposing specific preference weightings for each
dimension. The skyline query, which can produce a set of objects guaranteed to contain all top ranked objects for any linear
attribute preference combination, has been proposed to support this type of recommendation applications. However, it suffers
from the problem known as ‘dimensionality curse’ as the size of skyline query result set can grow exponentially with the number
of dimensions. Therefore, when the dimensionality is high, a large percentage of objects can become skyline points. This problem
makes such a recommendation system less usable for users. In this paper, we propose a stronger type of skyline query, called
core skyline query, that adopts a new quality measure called vertical dominance to return only an interesting subset of the traditional skyline points. An efficient query processing method is proposed to find core skyline points using
a novel indexing structure called Linked Multiple B’-trees (LMB). Our approach can find such superior skyline points progressively without the need of computing the entire set of skyline
points first. 相似文献
9.
Martin Grohe Yuri Gurevich Dirk Leinders Nicole Schweikardt Jerzy Tyszkiewicz Jan Van den Bussche 《Theory of Computing Systems》2009,44(4):533-560
We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass”
and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main
results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin
fragment of the relational algebra. 相似文献
10.
Ming Hua Jian Pei Xuemin Lin 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(1):129-153
Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain data using three parameters: rank threshold k, probability threshold p, and answer set size threshold l. Systematically, we identify four types of ranking queries on uncertain data. First, a probability threshold top-k query computes the uncertain records taking a probability of at least p to be in the top-k list. Second, a top-(k, l) query returns the top-l uncertain records whose probabilities of being ranked among top-k are the largest. Third, the p-rank of an uncertain record is the smallest number k such that the record takes a probability of at least p to be ranked in the top-k list. A rank threshold top-k query retrieves the records whose p-ranks are at most k. Last, a top-(p, l) query returns the top-l uncertain records with the smallest p-ranks. To answer such ranking queries, we present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation-based algorithm. To answer top-(k, l) queries and top-(p, l) queries, we propose PRist+, a compact index. An efficient index construction algorithm and efficacious query answering methods are developed for PRist+. An empirical study using real and synthetic data sets verifies the effectiveness of the probabilistic ranking queries and the efficiency of our methods. 相似文献
11.
On Similarity Measures for Multimedia Database Applications 总被引:1,自引:1,他引:0
A multimedia database query consists of a set of fuzzy and boolean (or crisp) predicates, constants, variables, and conjunction,
disjunction, and negation operators. The fuzzy predicates are evaluated based on different media criteria, such as color,
shape, layout, keyword. Since media-based evaluation yields similarity values, results to such a query is defined as an ordered
set. Since many multimedia applications require partial matches, query results also include tuples which do not satisfy all
predicates. Hence, any fuzzy semantics which extends the boolean semantics of conjunction in a straight forward manner may
not be desirable for multimedia databases. In this paper, we focus on the problem of ‘given a multimedia query which consists of multiple fuzzy and crisp predicates, how to provide the user with a meaningful
overall ranking.’ More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe
the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures
the retrieval requirements in multimedia databases.
Received 13 August 1999 / Revised 13 May 2000 / Accepted in revised form 26 July 2000 相似文献
12.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond
to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected
videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates
the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing
imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence
of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video
databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each
of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent
query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting
sets is impossible or yields poor results.
相似文献
Ramazan Savaş Aygün (Corresponding author)Email: |
13.
Ali Şaman Tosun 《Distributed and Parallel Databases》2006,19(2-3):107-124
Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query
retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates
schemes with low additive error. Recently, declustering using replication has been proposed to reduce the additive overhead.
Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval
mechanism for arbitrary queries based on design theory. Using the proposed c-copy replicated declustering scheme,
buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with
complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high
dimensions, supports incremental declustering and has good fault-tolerance property. Experimental results show the feasibility
of the algorithm.
Recommended by: Sunil Prabhakar 相似文献
14.
Sometime Query Answering Systems (QAS) for a Distributed Autonomous Information System (DAIS) may fail by returning the empty set of objects as an answer for a query q. Systems in DAIS can be incomplete, have hierarchical attributes, and the semantics of attributes and their values may differ between sites.
Also, if there are no objects in S matching q, the query may fail when submitted to S. Alternatively, QAS for S may try to relax the query q as it was proposed in T. Gaasterland (IEEE Expert, 12(5), 1997, 48–59), P. Godfrey (International Journal of Cooperative Information Systems, 6(2), 1997, 95–149) and W. Chu et al. (Journal of Intelligent Information Systems, 6(2/3), 1996, 223–259). It means that q can be replaced by a new more general query. Clearly, the goal is to find possibly the smallest generalization of q which will not fail in S. Smaller generalizations guarantee higher confidence in objects returned by QAS. Such QAS is called cooperative (only one site is involved). Queries may also fail in S when some of the attributes listed in q are outside the domain of S. To resolve this type of queries, assuming that S is a part of DAIS, we may extract definitions of such attributes from information systems residing at some of the remote sites for S and next use them to approximate q in S. In order to do that successfully, we assume that all involved systems have to agree on the ontology of some of their common
attributes Z.W. Raś and A. Dardzińska (Information Systems International Journal, 29(1), 2004, 47–58; Proceedings of FQAS 2004 Conference, LNCS/LNAI No. 3055, 2004, pp. 125–136); Z.W. Raś and S. Joshi, Fundamenta Informaticae Journal, 30(3/4), 1997, 313–324. QAS based on the above strategy is called collaborative (minimum two sites are involved). Similarly, a query may fail in S when the granularity of an attribute used in q is finer than the granularity of the same attribute in S. This paper shows how to use collaboration and cooperation approach to solve failing queries in DAIS assuming that attributes are hierarchical. Some aspects of a collaboration strategy dealing with failing query problem for
non-hierarchical attributes have been presented in Z.W. Raś and A. Dardzińska (Information Systems International Journal, 29(1), 2004, 47–58; Proceedings of FQAS 2004 Conference, LNCS/LNAI No. 3055, 2004, pp. 125–136). 相似文献
15.
Mariam Daoud Lynda-Tamine Lechani Mohand Boughanem 《Knowledge and Information Systems》2009,21(3):365-398
Most Web search engines use the content of the Web documents and their link structures to assess the relevance of the document
to the user’s query. With the growth of the information available on the web, it becomes difficult for such Web search engines
to satisfy the user information need expressed by few keywords. First, personalized information retrieval is a promising way
to resolve this problem by modeling the user profile by his general interests and then integrating it in a personalized document
ranking model. In this paper, we present a personalized search approach that involves a graph-based representation of the
user profile. The user profile refers to the user interest in a specific search session defined as a sequence of related queries.
It is built by means of score propagation that allows activating a set of semantically related concepts of reference ontology,
namely the ODP. The user profile is maintained across related search activities using a graph-based merging strategy. For the purpose of
detecting related search activities, we define a session boundary recognition mechanism based on the Kendall rank correlation measure that tracks changes in the dominant concepts held by the user profile relatively to a new submitted
query. Personalization is performed by re-ranking the search results of related queries using the user profile. Our experimental
evaluation is carried out using the HARD 2003 TREC collection and showed that our session boundary recognition mechanism based
on the Kendall measure provides a significant precision comparatively to other non-ranking based measures like the cosine and the WebJaccard similarity measures. Moreover, results proved that the graph-based search personalization is effective for improving the
search accuracy. 相似文献
16.
The two dimensional range minimum query problem is to preprocess a static m by n matrix (two dimensional array) A of size N=m⋅n, such that subsequent queries, asking for the position of the minimum element in a rectangular range within A, can be answered efficiently. We study the trade-off between the space and query time of the problem. We show that every
algorithm enabled to access A during the query and using a data structure of size O(N/c) bits requires Ω(c) query time, for any c where 1≤c≤N. This lower bound holds for arrays of any dimension. In particular, for the one dimensional version of the problem, the lower
bound is tight up to a constant factor. In two dimensions, we complement the lower bound with an indexing data structure of
size O(N/c) bits which can be preprocessed in O(N) time to support O(clog 2
c) query time. For c=O(1), this is the first O(1) query time algorithm using a data structure of optimal size O(N) bits. For the case where queries can not probe A, we give a data structure of size O(N⋅min {m,log n}) bits with O(1) query time, assuming m≤n. This leaves a gap to the space lower bound of Ω(Nlog m) bits for this version of the problem. 相似文献
17.
The in–network aggregation paradigm in sensor networks provides a versatile approach for evaluating aggregate queries. Traditional
approaches need a separate aggregate to be computed and communicated for each query and hence do not scale well with the number
of queries. Since approximate query results are sufficient for many applications, we use an alternate approach based on summary
data–structures. We consider two kinds of aggregate queries: location range queries that compute the sum of values reported by sensors in a given location range, and value range queries that compute the number of sensors that report values in a given range. We construct summary data–structures called linear sketches, over the sensor data using in–network aggregation and use them to answer aggregate queries in an approximate manner at the
base–station. There is a trade–off between accuracy of the query results and lifetime of the sensor network that can be exploited
to achieve increased lifetimes for a small loss in accuracy. Most commonly occurring sets of range queries are highly correlated
and display rich algebraic structure. Our approach takes full advantage of this by constructing linear sketches that depend
on queries. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks
for only a small loss in accuracy of the queries. Further, our approach achieves more accurate query results than the other
classical techniques using Discrete Fourier Transform and Discrete Wavelet Transform.
This work was supported in part by NASA under Cooperative Agreement NCC5–315. 相似文献
18.
Efficient evaluation of continuous spatio-temporal queries on moving objects with uncertain velocity 总被引:1,自引:0,他引:1
Continuous Range (CR) query and Continuous K-Nearest Neighbor (CKNN) query are two important types of spatio-temporal queries. Given a time interval [t
s
, t
e
] and a moving query object q, a CR query is to find the moving objects whose Euclidean distances to q are within a user-given distance at each time instant within [t
s
, t
e
]. A CKNN query is to retrieve the K-Nearest Neighbors (KNNs) of this query object q at each time instant within [t
s
, t
e
]. In this paper, we investigate how to process these spatio-temporal queries efficiently under the situation that the velocity
of each object is not fixed. This uncertainty on the velocity of object inevitably results in high complexity in processing
spatio-temporal queries. We will discuss the complications incurred by this uncertainty and propose two algorithms, namely
the Possibility-based possible within objects searching algorithm and the Possibility-based possible KNN searching algorithm, for the CR query and the CKNN query, respectively. A Possibility-based model is designed accordingly to quantify the possibility of each object being
the result of a CR query or a CKNN query. Comprehensive experiments are performed to demonstrate the effectiveness and the efficiency of the proposed approaches. 相似文献
19.
We give a general framework for approximate query processing in semistructured databases. We focus on regular path queries,
which are the integral part of most of the query languages for semistructured databases. To enable approximations, we allow
the regular path queries to be distorted. The distortions are expressed in the system by using weighted regular expressions, which correspond to weighted regular
transducers. After defining the notion of weighted approximate answers we show how to compute them in order of their proximity
to the query. In the new approximate setting, query containment has to be redefined in order to take into account the quantitative
proximity information in the query answers. For this, we define the approximate containment, and its variants k-containment and reliable contain-ment. Then, we give an optimal algorithm for deciding the k-containment. Regarding the reliable approximate containment, we show that it is polynomial time equivalent to the notorious
limitedness problem in distance automata. 相似文献
20.
The result quality of queries incorporating impreciseness can be improved by the specification of user-defined weights. Existing
approaches evaluate weighted queries by applying arithmetic evaluations on top of the query’s intrinsic logic. This complicates the usage of logic-based optimization. Therefore, we
suggest a weighting approach that is completely embedded in a logic.
In order to facilitate the user interaction with the system, we exploit the intuitively comprehensible concept of preferences.
In addition, we use a machine-based learning algorithm to learn weighting values in correspondence to the user’s intended
semantics of a posed query. Experiments show the utility of our approach. 相似文献