期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

XML with data values: typechecking revisited

Noga Alon 《Journal of Computer and System Sciences》2003,66(4):688-727

We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output DTD, for inputs satisfying a given input DTD. This problem had been studied by a subset of the authors in a simplified framework that captured the structure of XML documents but ignored data values. We revisit here the typechecking problem in the more realistic case when data values are present in documents and tested by queries. In this extended framework, typechecking quickly becomes undecidable. However, it remains decidable for large classes of queries and DTDs of practical interest. The main contribution of the present paper is to trace a fairly tight boundary of decidability for typechecking with data values. The complexity of typechecking in the decidable cases is also considered. 相似文献

2.

Querying Incomplete Information in Semistructured Data

《Journal of Computer and System Sciences》2002,64(3):655-693

Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers. We do not present, however, a concrete query language. Queries over classical structured data models contain a number of variables and constraints on these variables. An answer is a binding of the variables by elements of the database such that the constraints are satisfied. In the present paper, we loosen this concept in so far as we allow also answers that are partial; that is, not all variables in the query are bound by such an answer. Partial answers make it necessary to refine the model of query evaluation. The first modification relates to the satisfaction of constraints: in some circumstances we consider constraints involving unbound variables as satisfied. Second, in order to prevent a proliferation of answers, we only accept answers that are maximal in the sense that there are no assignments that bind more variables and satisfy the constraints of the query. Our model of query evaluation consists of two phases, a search phase and a filter phase. Semistructured databases are essentially labeled directed graphs. In the search phase, we use a query graph containing variables to match a maximal portion of the database graph. We investigate three different semantics for query graphs, which give rise to three variants of matching. For each variant, we provide algorithms and complexity results. In the filter phase, the maximal matchings resulting from the search phase are subjected to constraints, which may be weak or strong. Strong constraints require all their variables to be bound, while weak constraints do not. We describe a polynomial algorithm for evaluating a special type of queries with filter constraints, and assess the complexity of evaluating other queries for several kinds of constraints. In the final part, we investigate the containment problem for queries consisting only of search constraints under the different semantics. 相似文献

3.

Nonrecursive incremental evaluation of Datalog queries 总被引：1，自引：0，他引：1

Guozhu Dong Jianwen Su Rodney Topor 《Annals of Mathematics and Artificial Intelligence》1995,14(2-4):187-223

We consider the problem of repeatedly evaluating the same (computationally expensive) query to a database that is being updated between successive query requests. In this situation, it should be possible to use the difference between successive database states and the answer to the query in one state to reduce the cost of evaluating the query in the next state. We use nonrecursive Datalog (which are unions of conjunctive queries) to compute the differences, and call this process incremental query evaluation using conjunctive queries. After formalizing the notion of incremental query evaluation using conjunctive queries, we give an algorithm that constructs, for each regular chain query (including transitive closure as a special case), a nonrecursive Datalog program to compute the difference between the answer after an update and the answer before the update. We then extend this result to weakly regular queries, which are regular chain programs augmented with conjunctive queries having the so-called Cartesian-closed increment property, and to the case of unbounded-set insertions where the sets are binary Cartesian products. Finally, we show that the class of conjunctive queries with the Cartesian-closed increment property is decidable.Parts of the results in this paper appeared as extended abstracts in theProceedings of the 1992 International Conference on Database Theory (LNCS 646, Springer-Verlag), and in theProceedings of the 1993 International Workshop on Database Programming Languages (Workshops in Computing, Springer-Verlag).Guozhu Dong gratefully acknowledges support of the Australian Research Council through research grants, and the Centre for Intelligen Decision Systems.Work by Jianwen Su was supported in part by NSF Grants IRI-9109520 and IRI-9117094. 相似文献

4.

Query containment for data integration systems

《Journal of Computer and System Sciences》2003,66(1):20-39

相似文献

5.

Delayed Binary Search, or Playing Twenty Questions with a Procrastinator

Ambainis Bloch Schweizer 《Algorithmica》2008,32(4):641-651

Abstract. We study the classic binary search problem, with a delay between query and answer. For all constant delays, we give matching upper and lower bounds on the number of queries. 相似文献

6.

An order-sorted query system for sort, predicate, and meta-predicate hierarchies

Ken Kaneiwa Philip H. P. Nguyen 《Knowledge and Information Systems》2013,35(2):463-492

This paper presents a decidable order-sorted query system for reasoning between ontologies and rules. We describe order-sorted logic programming with sort, predicate, and meta-predicate hierarchies (OSL_3h), which derives predicate and meta-predicate assertions. Meta-level predicates (predicates of predicates) are useful for representing relationships between predicate formulas, and further, they conceptually yield a hierarchy similar to the hierarchies of sorts and predicates. By extending the order-sorted Horn-clause calculus, we develop a query-answering system in OSL_3h that can answer queries such as atoms and meta-atoms generalized by containing predicate variables. We show that the expressive query-answering system computes every generalized query in single exponential time, that is, the complexity of our query system is equal to that of DATALOG. 相似文献

7.

Answering reachability queries with ordered label constraints over labeled graphs

Daoliang HE Pingpeng YUAN Hai JIN 《Frontiers of Computer Science》2024,18(1):181601

Reachability query plays a vital role in many graph analysis tasks. Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs. Since many real graphs are labeled graph, it highly demands Label-Constrained Reachability (LCR) query in which constraint includes a set of labels besides vertex pairs. Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path. Besides that constraint may be a label set, query constraint may be ordered labels, namely OLCR (Ordered-Label-Constrained Reachability) queries which retrieve paths matching a sequence of labels. Currently, no solutions are available for OLCR. Here, we propose DHL, a novel bloom filter based indexing technique for answering OLCR queries. DHL can be used to check reachability between vertex pairs. If the answers are not no, then constrained DFS is performed. So, we employ DHL followed by performing constrained DFS to answer OLCR queries. We show that DHL has a bounded false positive rate, and it’s powerful in saving indexing time and space. Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8–22.5 times smaller index space and 4.6–114 times less index construction time than two state-of-art techniques for LCR queries, while achieving comparable query response time. The results also show that our algorithm can answer OLCR queries effectively. 相似文献

8.

Delayed Binary Search, or Playing Twenty Questions with a Procrastinator

Ambainis Bloch Schweizer 《Algorithmica》2002,32(4):641-651

We study the classic binary search problem, with a delay between query and answer. For all constant delays, we give matching upper and lower bounds on the number of queries. 相似文献

9.

A simple characterization of uniform boundedness for a class of recursions

《The Journal of Logic Programming》1991,10(3-4):233-253

Detecting bounded recursions is a powerful optimization technique for recursive database query languages, as bounded recursions can be replaced by equivalent nonrecursive definitions. The problem is also of theoretical interest in that varying the class of recursions considered generates problem instances that vary from linearly decidable to NP-hard to undecidable. In this paper we review and clarify the existing definitions of boundedness. We then specify a class of recursions C such that membership in C guarantees that a certain simple condition is necessary and sufficient for boundedness. We use the notion of membership in C to unify and extend previous work on determining decidable classes of bounded recursions. 相似文献

10.

Weighted path queries on semistructured databases

《Information and Computation》2006,204(5):679-696

Path queries have been extensively used to query semistructured data, such as the Web and XML documents. In this paper we introduce weighted path queries, an extension of path queries enabling several classes of optimization problems (such as the computation of shortest paths) to be easily expressed. Weighted path queries are based on the notion of weighted regular expression, i.e., a regular expression whose symbols are associated to a weight. We characterize the problem of answering weighted path queries and provide an algorithm for computing their answer. We also show how weighted path queries can be effectively embedded into query languages for XML data to express in a simple and compact form several meaningful research problems. 相似文献

11.

Buffer queries 总被引：2，自引：0，他引：2

Chan E.P.F. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(4):895-910

A class of commonly asked queries in a spatial database is known as buffer queries. An example of such a query is to "find house-power line pairs that are within 50 meters of each other." A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distance d of each other. Given nonpoint spatial objects, evaluation of buffer queries could be a costly operation, even when the numbers of objects in the input data sets are relatively small. This paper addresses the problem of how to evaluate this class of queries efficiently. A fundamental problem with buffer query evaluation is to find an efficient algorithm for solving the minimum distance (miniDist) problem for lines and regions. An efficient minDist algorithm, which only requires a subsequence of segments from each object to be examined, is derived. Finding a fast minDist algorithm is the first step in evaluating a buffer query efficiently. It is observed that many, and sometimes even most, candidates can be proven in the answer without resorting to the relatively expensive minDist operation. A candidate is first evaluated with a least expensive technique-called O-object filtering. If it fails, a more costly operation, called 1-object filtering, is applied. Finally, if both filterings fail, the most expensive minDist algorithm is invoked. To show the effectiveness of the these techniques, they are incorporated into the well-known tree join algorithm and tested with real-life as well as artificial data sets. Extensive experiments show that the proposed algorithm outperforms existing techniques by a wide margin in both execution time as well as IO accesses. More importantly, the performance gain improves drastically with the increase of distance values. 相似文献

12.

Adaptive relaxation for querying heterogeneous XML data sources

Chengfei Liu Jianxin Li Jeffrey Xu Yu Rui Zhou 《Information Systems》2010

Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach. 相似文献

13.

Vague continuous K-nearest neighbor queries over moving objects with uncertain velocity in road networks 总被引：1，自引：0，他引：1

Ping Fan Guohui LiLing Yuan Yanhong Li 《Information Systems》2012,37(1):13-32

Recent research has focused on Continuous K Nearest Neighbor (CKNN) queries in road networks, where the queries and the data objects are moving. Most existing approaches assume the fixed velocity of moving objects. The release of fixed moving velocity makes the query process slowly due to the significant repetitive query cost. In this paper, we study CKNN queries over moving objects with uncertain velocity in road networks. A Distance Interval Model (DIM) is designed to calculate the minimal and maximal road network distances between moving objects and query point. Furthermore, we propose a novel Possibility-based Vague KNN (PVKNN) algorithm to process the query efficiently, which determines the CKNN query results with possibility within each division time subinterval of given time interval by applying the vague set theory. In the PVKNN algorithm, the query efficiency can be improved significantly with the pruning, distilling and possibility-ranking phases. With these phases, the objects candidates are scaled down and the given time interval is divided into subintervals to reduce the repetitive query cost. In addition, an index structure TPR^uv-Tree is designed to efficiently index moving objects with uncertain velocity in road network by involving edge connection and moving objects information. Experiments with simulation and comparison show that significant improvement in the performance of efficiency can be achieved with our proposed algorithms. 相似文献

14.

A formal perspective on the view selection problem

Rada Chirkova Alon Y. Halevy Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(3):216-237

The view selection problem is to choose a set of views to materialize over a database schema, such that the cost of evaluating a set of workload queries is minimized and such that the views fit into a prespecified storage constraint. The two main applications of the view selection problem are materializing views in a database to speed up query processing, and selecting views to materialize in a data warehouse to answer decision support queries. In addition, view selection is a core problem for intelligent data placement over a wide-area network for data integration applications and data management for ubiquitous computing. We describe several fundamental results concerning the view selection problem. We consider the problem for views and workloads that consist of equality-selection, project and join queries, and show that the complexity of the problem depends crucially on the quality of the estimates that a query optimizer has on the size of the views it is considering to materialize. When a query optimizer has good estimates of the sizes of the views, we show a somewhat surprising result, namely, that an optimal choice of views may involve a number of views that is exponential in the size of the database schema. On the other hand, when an optimizer uses standard estimation heuristics, we show that the number of necessary views and the expression size of each view are polynomially bounded. Received: November 20, 1001 / Accepted: May 30, 2002 / Published online: September 25, 2002 相似文献

15.

Decidable containment of recursive queries

《Theoretical computer science》2005,336(1):33-56

One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query reformulation, knowledge-base verification, information integration, integrity checking, and cooperative answering. Containment is undecidable in general for Datalog, the fundamental language for expressing recursive queries. On the other hand, it is known that containment between monadic Datalog queries and between Datalog queries and unions of conjunctive queries are decidable. It is also known that containment between unions of conjunctive two-way regular path queries, which are queries used in the context of semistructured data models containing a limited form of recursion in the form of transitive closure, is decidable. In this paper, we combine the automata-theoretic techniques at the base of these two decidability results to show that containment of Datalog in union of conjunctive two-way regular path queries is decidable in 2EXPTIME. By sharpening a known lower bound result for containment of Datalog in union of conjunctive queries we show also a matching lower bound. 相似文献

16.

Efficient Range Query Processing in Peer-to-Peer Systems

Li Dongsheng Cao Jiannong Lu Xicheng Chen Keith C. C. 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(1):78-91

With the increasing popularity of the peer-to-peer (P2P) computing paradigm, many general range query schemes for distributed hash table (DHT)-based P2P systems have been proposed in recent years. Although those schemes can provide range query capability without modifying the underlying DHTs, they have the query delay depending on both the scale of the system and the size of the query space or the specific query, and thus cannot guarantee to return the query results in a bounded delay. In this paper, we propose Armada, an efficient range query processing scheme to support delay-bounded single-attribute and multiple-attribute range queries. It is the first delay-bounded general range query scheme on constant-degree DHTs, and can return the results for any range query within 2logN hops in a P2P system with N peers. Results of analysis and simulations show that the average delay in Armada is less than logN, and the average message cost of single-attribute range queries is about logN+2n 2 (n is the number of peers that intersect with the query). These results are very close to the lower bounds on delay and message cost of range queries over constant-degree DHTs. 相似文献

17.

Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Biao Qin Shan Wang 《Information Sciences》2011,181(4):812-831

In this paper, we prove that a query plan is safe in tuple independent probabilistic databases if and only if its every answer tuple is tree structured in probabilistic graphical models. We classify hierarchical queries into core and non-core hierarchical queries and show that the existing methods can only generate safe plans for core hierarchical queries. Inspired by the bucket elimination framework, we give the sufficient and necessary conditions for the answer relation of every candidate sub-query to be used as a base relation. Finally, the proposed algorithm generates safe plans for extensional query evaluation on non-boolean hierarchical queries and invokes the SPROUT algorithm [24] for intensional query evaluation on boolean queries. A case study on the TPC-H benchmark reveals that the safe plans of Q7 and Q8 can be evaluated efficiently. Furthermore, extensive experiments show that safe plans generated by the proposed algorithm scale well. 相似文献

18.

Transactional support for adaptive indexing

Goetz Graefe Felix Halim Stratos Idreos Harumi Kuno Stefan Manegold Bernhard Seeger 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(2):303-328

Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control and recovery in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs. 相似文献

19.

Efficient range query processing over DHTs based on the balanced Kautz tree

Yiming Zhang Ling Liu Xicheng Lu Dongsheng Li 《Concurrency and Computation》2011,23(8):796-814

Distributed Hash Tables (DHTs) are scalable, self‐organizing, and adaptive to underlying topology changes, thus being a promising infrastructure for hosting large‐scale distributed applications. The ever‐wider use of DHT infrastructures has found more and more applications that require support for range queries. Recently, a number of DHT‐based range query schemes have been proposed. However, most of them suffer from high query delay or imbalanced load distribution. To address these problems, in this paper we first present an efficient indexing structure called Balanced Kautz (BK) tree that uniformly maps the m‐dimensional data space onto DHT nodes, and then propose a BK tree‐based range query scheme called ERQ that processes range queries in a parallel fashion and guarantees to return the results in a bounded delay. In a DHT with N nodes, ERQ can answer any range of query in less than rmlog N(2loglog N + 1) hops in a load‐balanced manner, irrespective of the queried range, the whole space size, or the number of queried attributes. The effectiveness of our proposals is demonstrated through experiments. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

20.

Querying Probabilistic Business Processes for Sub-Flows

Daniel Deutch 《Theory of Computing Systems》2013,52(3):367-402

A Business Process (BP for short) consists of a set of activities which, combined in a flow, achieve some business goal. A given BP may have a large, possibly infinite, number of possible execution flows (EX-flows for short), each having some probability to occur at run time. This paper studies query evaluation over such probabilistic BPs. We focus on two important classes of queries, namely boolean queries that compute the probability that a random EX-flow of a BP satisfies a given property, and projection queries focusing on portions of EX-flows that are of interest to the user. For the latter queries the answer consists of the top-k instances of these portions that are most likely to occur at run-time. We study the complexity of query evaluation for both kinds of queries, showing in particular that projection queries may be harder to evaluate than boolean queries. We present a picture of which combinations of BP classes and query features lead to PTIME algorithms and which to NP-hard or infeasible problems. 相似文献