首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In this article we study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved. A query is stable if for any instance of the relations in the query, its complete answer can be computed using the access patterns permitted by the relations. We study the problem of testing stability of various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We give algorithms and complexity results for these classes of queries. We show that stability of datalog programs is undecidable, and give a sufficient condition for stability of datalog queries. Finally, we study data-dependent computability of the complete answer to a nonstable query, and propose a decision tree for guiding the process to compute the complete answer.Received: 6 December 2001, Accepted: 25 November 2002, Published online: 3 April 2003Chen Li: This article combines and integrates some content in the technical report at Stanford University [25] and the paper presented in the 8th International Conference on Database Theory (ICDT), London, UK, January, 2001 [28]. In addition to the prior materials, this article contains more results and complete proofs that were not included in the original reports.  相似文献   

2.
We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism.  相似文献   

3.
Negative Results for Equivalence Queries   总被引:1,自引:5,他引:1  
Angluin  Dana 《Machine Learning》1990,5(2):121-150
We consider the problem of exact identification of classes of concepts using only equivalence queries. We define a combinatorial property,approximate fingerprints, of classes of concepts and show that no class with this property can be exactly identified in polynomial time using only equivalence queries. As applications of this general theorem, we show that there is no polynomial time algorithm using only equivalence queries that exactly identifies deterministic or nondeterministic finite state acceptors, context free grammars, or disjunctive or conjunctive normal form boolean formulas.  相似文献   

4.
Answering queries using views is the problem which examines how to derive the answers to a query when we only have the answers to a set of views. Constructing rewritings is a widely studied technique to derive those answers. In this paper we consider the problem of the existence of rewritings in the case where the answers to the views uniquely determine the answers to the query. Specifically, we say that a view set Vdetermines a query Q if for any two databases D1,D2 it holds: V(D1)=V(D2) implies Q(D1)=Q(D2). We consider the case where query and views are defined by conjunctive queries and investigate the question: If a view set V determines a query Q, is there an equivalent rewriting of Q using V? We present here interesting cases where there are such rewritings in the language of conjunctive queries. Interestingly, we identify a class of conjunctive queries, CQpath, for which a view set can produce equivalent rewritings for “almost all” queries which are determined by this view set. We introduce a problem which relates determinacy to query equivalence. We show that there are cases where restricted results can carry over to broader classes of queries.  相似文献   

5.
6.
There is much current interest in publishing and viewing databases as XML documents. The general benefits of this approach follow from the popularity of XML and the tool set available for visualizing and processing information encoded in this universal standard. In this paper, we explore the additional and unique benefits achieved by this approach on temporal database applications. We show that XML with XQuery can provide surprisingly effective solutions to the problem of supporting historical queries on past content of database relations and their evolution. Indeed, using XML, the histories of database relations can be naturally represented by temporally grouped data models. Thus, we identify mappings from relations to XML that are most conducive to modeling and querying database histories, and show that temporal queries that would be difficult to express in SQL can be easily expressed in standard XQuery. This approach is very general, insofar as it can be used to store the version history of arbitrary documents and, for relational databases, it also supports queries on the evolution of their schema. Then, we turn to the problem of supporting efficiently the storage and the querying of relational table histories. We present an experimental study of the pros and cons of using native XML databases, versus using traditional databases, where the XML-represented histories are supported as views on the historical tables.  相似文献   

7.
We consider a workload of aggregate queries and investigate the problem of selecting materialized views that (1) provide equivalent rewritings for all the queries, and (2) are optimal, in that the cost of evaluating the query workload is minimized. We consider conjunctive views and rewritings, with or without aggregation; in each rewriting, only one view contributes to computing the aggregated query output. We look at query rewriting using existing views and at view selection. In the query-rewriting problem, we give sufficient and necessary conditions for a rewriting to exist. For view selection, we prove complexity results. Finally, we give algorithms for obtaining rewritings and selecting views.  相似文献   

8.
We present an approach for mining frequent conjunctive in arbitrary relational databases. Our pattern class is the simple, but appealing subclass of simple conjunctive queries. Our algorithm, called Conqueror $^+$ , is capable of detecting previously unknown functional and inclusion dependencies that hold on the database relations as well as on joins of relations. These newly detected dependencies are then used to prune redundant queries. We propose an efficient database-oriented implementation of our algorithm using SQL and provide several promising experimental results.  相似文献   

9.
In this paper, we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments. Part of this article was published elsewhere [Chirkova, R., Li, C.: Materializing views with minimal size to answer queries. PODS (2003)]. In addition to the prior materials, this article contains new theoretical results, as well as new results on how to efficiently implement the proposed techniques (Sects. 5 and 5.4)  相似文献   

10.
The problem of finding contained rewritings of queries using views is of great importance in mediated data integration systems. In this paper, we first present a general approach for finding contained rewritings of unions of conjunctive queries with arbitrary built-in predicates. Our approach is based on an improved method for testing conjunctive query containment in this context. Although conceptually simple, our approach generalizes previous methods for finding contained rewritings of conjunctive queries and is more powerful in the sense that many rewritings that can not be found using existing methods can be found by our approach. Furthermore, implication constraints (Zhang, X., & Ozsoyoglu, Z.M. (1997). Implication and referential constraints: A new formal resaoning. IEEE TKDE, 9(6):894–910, Now/Dec.) over the base relations can be easily handled. We then present a simplified approach which is less complete, but is much faster than the general approach, and it still finds maximum rewritings in several special cases. Our general approach finds more rewritings than previous algorithms such as the Bucket and the resolution-based algorithms. Our simplified approach generalizes the U-join and the MiniCon algorithms with no loss of efficiency.  相似文献   

11.
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency.  相似文献   

12.
In this paper, we investigate the problem of view selection for workloads of conjunctive queries under bag and bag-set semantics. In particular, for both semantics we aim to limit the search space of candidate viewsets. We also start delineating boundaries between query workloads for which certain even more restricted search spaces suffice. They suffice in the sense that they do not compromise optimality in that they contain at least one of the optimal solutions. We start with the general case for both bag and bag-set semantics, where we give a tight condition that candidate views can satisfy and still the search space (thus limited) does contain at least one optimal solution. We show that these results, for both semantics, reduce the size of the search space significantly. Further on, due to this analysis for both semantics, a delineation of the space of viewsets and the space of the corresponding equivalent rewritings for a certain conjunctive query workload is given. We show that for chain query workloads under both bag and bag-set semantics, taking only chain views may miss optimal solutions, whereas, if we further limit the queries to be path-queries (i.e., chain queries over a single binary relation), then, under bag semantics, path-views suffice. Concentrating to bag-set semantics, we show that the path-viewsets do not suffice for every path-query workload.  相似文献   

13.
The Web is a source of valuable information, but the process of collecting, organizing, and effectively utilizing the resources it contains is difficult. We describe CorpusBuilder, an approach for automatically generating Web search queries for collecting documents matching a minority concept. The concept used for this paper is that of text documents belonging to a minority natural language on the Web. Individual documents are automatically labeled as relevant or nonrelevant using a language filter, and the feedback is used to learn what query lengths and inclusion/exclusion term-selection methods are helpful for finding previously unseen documents in the target language. Our system learns to select good query terms using a variety of term scoring methods. Using odds ratio scores calculated over the documents acquired was one of the most consistently accurate query-generation methods. To reduce the number of estimated parameters, we parameterize the query length using a Gamma distribution and present empirical results with learning methods that vary the time horizon used when learning from the results of past queries. We find that our system performs well whether we initialize it with a whole document or with a handful of words elicited from a user. Experiments applying the same approach to multiple languages are also presented showing that our approach generalizes well across several languages regardless of the initial conditions.  相似文献   

14.
网页搜索中的查询时效性是指查询对新闻网页的需求。这种时间相关的因素,在网页排序过程中用于平衡其他非时间性因素,使排序更好地满足用户体验。为此该文提出了一种查询时效性的实时计算模型从用户搜索和媒体报道两个角度,分别对时效性建模,然后这两种不同来源的时效性相互补充,综合计算某个时刻用户搜索某个查询时,其综合时效性得分。这个量化得分在网页排序阶段用于提高或抑制新闻网页的露出;同时也为网页搜索结果中展现新闻直达区提供依据。在人工评测以及用户点击通过率统计上,该模型均取得了不错的实际效果。  相似文献   

15.
MiniCon: A scalable algorithm for answering queries using views   总被引:5,自引:0,他引:5  
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations. The problem has received significant attention because of its relevance to a wide variety of data management problems, such as data integration, query optimization, and the maintenance of physical data independence. To date, the performance of proposed algorithms has received very little attention, and in particular, their scale up in the presence of a large number of views is unknown. We first analyze two previous algorithms, the bucket algorithm and the inverse-rules, and show their deficiencies. We then describe the MiniCon, a novel algorithm for finding the maximally-contained rewriting of a conjunctive query using a set of conjunctive views. We present the first experimental study of algorithms for answering queries using views. The study shows that the MiniCon scales up well and significantly outperforms the previous algorithms. We describe an extension of the MiniCon to handle comparison predicates, and show its performance experimentally. Finally, we describe how the MiniCon can be extended to the context of query optimization. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

16.
Despite the importance of ranked queries in numerous applications involving multi-criteria decision making, they are not efficiently supported by traditional database systems. In this paper, we propose a simple yet powerful technique for processing such queries based on multi-dimensional access methods and branch-and-bound search. The advantages of the proposed methodology are: (i) it is space efficient, requiring only a single index on the given relation (storing each tuple at most once), (ii) it achieves significant (i.e., orders of magnitude) performance gains with respect to the current state-of-the-art, (iii) it can efficiently handle data updates, and (iv) it is applicable to other important variations of ranked search (including the support for non-monotone preference functions), at no extra space overhead. We confirm the superiority of the proposed methods with a detailed experimental study.  相似文献   

17.
Nonrecursive incremental evaluation of Datalog queries   总被引:1,自引:0,他引:1  
We consider the problem of repeatedly evaluating the same (computationally expensive) query to a database that is being updated between successive query requests. In this situation, it should be possible to use the difference between successive database states and the answer to the query in one state to reduce the cost of evaluating the query in the next state. We use nonrecursive Datalog (which are unions of conjunctive queries) to compute the differences, and call this process incremental query evaluation using conjunctive queries. After formalizing the notion of incremental query evaluation using conjunctive queries, we give an algorithm that constructs, for each regular chain query (including transitive closure as a special case), a nonrecursive Datalog program to compute the difference between the answer after an update and the answer before the update. We then extend this result to weakly regular queries, which are regular chain programs augmented with conjunctive queries having the so-called Cartesian-closed increment property, and to the case of unbounded-set insertions where the sets are binary Cartesian products. Finally, we show that the class of conjunctive queries with the Cartesian-closed increment property is decidable.Parts of the results in this paper appeared as extended abstracts in theProceedings of the 1992 International Conference on Database Theory (LNCS 646, Springer-Verlag), and in theProceedings of the 1993 International Workshop on Database Programming Languages (Workshops in Computing, Springer-Verlag).Guozhu Dong gratefully acknowledges support of the Australian Research Council through research grants, and the Centre for Intelligen Decision Systems.Work by Jianwen Su was supported in part by NSF Grants IRI-9109520 and IRI-9117094.  相似文献   

18.
19.
Although using domain specific knowledge sources for information retrieval yields more accurate results compared to pure keyword-based methods, more improvements can be achieved by considering both relations between concepts in an ontology and also their statistical dependencies over the corpus. In this paper, an innovative approach named concept-based pseudo-relevance feedback is introduced for improving accuracy of biomedical retrieval systems. Proposed method uses a hybrid retrieval algorithm for discovering relevancy between queries and documents which is based on a combination of keyword- and concept-based approaches. It also uses a pseudo-relevance feedback mechanism for expanding initial queries with auxiliary biomedical concepts extracted from top-ranked results of hybrid information retrieval. Using concept-based similarities makes it possible for the system to detect related documents to users’ queries, which are semantically close to each other while not necessarily sharing common keywords. In addition, expanding initial queries with concepts introduced by pseudo-relevance feedback captures those relations between queries and documents, which rely on statistical dependencies between concepts they contain. As a matter of fact, these relations may remain undetected, examining merely existing links between concepts in an external knowledge source. Proposed approach is evaluated using OHSUMED test collection and standard evaluation methods from text retrieval conference (TREC). Experimental results on MEDLINE documents (in OHSUMED collection) show 21% improvement over keyword-based approach in terms of mean average precision, which is a noticeable gain.  相似文献   

20.
Searchable symmetric encryption (SSE) has been introduced for secure outsourcing the encrypted database to cloud storage, while maintaining searchable features. Of various SSE schemes, most of them assume the server is honest but curious, while the server may be trustless in the real world. Considering a malicious server not honestly performing the queries, verifiable SSE (VSSE) schemes are constructed to ensure the verifiability of the search results. However, existing VSSE constructions only focus on single-keyword search or incur heavy computational cost during verification. To address this challenge, we present an efficient VSSE scheme, built on OXT protocol (Cash et al., CRYPTO 2013), for conjunctive keyword queries with sublinear search overhead. The proposed VSSE scheme is based on a privacy-preserving hash-based accumulator, by leveraging a well-established cryptographic primitive, Symmetric Hidden Vector Encryption (SHVE). Our VSSE scheme enables both correctness and completeness verifiability for the result without pairing operations, thus greatly reducing the computational cost in the verification process. Besides, the proposed VSSE scheme can still provide a proof when the search result is empty. Finally, the security analysis and experimental evaluation are given to demonstrate the security and practicality of the proposed scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号