期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the complexity of tree pattern containment with arithmetic comparisons

Foto N. Afrati Gabriel Kuper 《Information Processing Letters》2011,111(15):754-760

In this paper we investigate the complexity of query containment problem for tree patterns (which express a fragment of XPath) with arbitrary arithmetic comparisons. We assume that attributes take values from a totally ordered domain and allow constraints that involve arithmetic comparisons. We show that the containment problem is -complete in the general case, but remains co-NP complete for tree patterns with left semi-interval (<, ?, =) or right semi-interval (>, ?, =) attribute constraints. 相似文献

2.

Efficient processing of partially specified twig pattern queries

LING TokWang 《中国科学F辑(英文版)》2009,52(10):1830-1847

As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this featu... 相似文献

3.

存在完整性约束时最小化树模式查询的算法

张凡熊志平胡运发《计算机工程》2006,32(10):66-67,70

树模式是查询树型结构数据如XML和LDAP的天然模型。在一个给定的数据库上进行查询,查询的效率很大程度上依赖于查询的大小。因此,在查询前删除查询中的冗余分支,使查询最小化是非常重要的。在树型结构数据库中,存在孩子必需、后代必需和子类3种完整性约束是十分普遍的。针对存在这3种完整性约束的情况,基于扩展的模拟概念提出了一种复杂度为O（n^2）的最小化树模式查询算法（n为树模式查询的节点数）。分析结果表明这个算法的效率要远高于同类算法。相似文献

4.

XML twig pattern matching using version tree

Xin Guiquan 《Data & Knowledge Engineering》2008,64(3):580-599

A common problem of XML query algorithms is that execution time and input size grows rapidly as the size of XML document increases. In this paper, we propose a version-labeling scheme and TwigVersion algorithm to address this problem. The version-labeling scheme is utilized to identify all repetitive structures in XML documents, and the Version Tree is constructed to hold such version information. To process a query, TwigVersion generates a filter through the created Version Tree, and the final answer to the query can be retrieved from the database easily through the filtering process. Both theoretical proof and experimental results reported in this paper demonstrate that the concise structure of Version Tree and the reduced input size make TwigVersion outperform the existing approaches. 相似文献

5.

Efficient processing of XPath queries using indexes

Sanjay Madria Yan Chen Kalpdrum Passi Sourav Bhowmick 《Information Systems》2007

A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach. 相似文献

6.

Efficient preprocessing of XML queries using structured signatures

Yon Dohn Chung 《Information Processing Letters》2003,87(5):257-264

The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path queries. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose two structured signature methods for XML documents. Through experiments, we evaluate the performance of the proposed methods. 相似文献

7.

Partitioned optimization of complex queries

Damianos Chatziantoniou Kenneth A. Ross 《Information Systems》2007

Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires significant aggregation over different attribute sets (dimensions) of the participating relations. Such queries may take hours or days, a time period unacceptable in most cases. As a result, it is important to study these queries and identify special frequent cases that can be evaluated with specialized algorithms. Understanding complex aggregate queries leads to better execution plans and, consequently, performance. 相似文献

8.

A bi-labeling based XPath processing system

Yi Chen Susan B. Davidson Yifeng Zheng 《Information Systems》2010

We present BLAS, a Bi-LAbeling based XPath processing System. BLAS uses two labeling schemes to speed up query processing: P-labeling for processing consecutive child (or parent) axis traversals, and D-labeling for processing descendant (or ancestor) axis traversals. XML data are stored in labeled form and indexed. Algorithms are presented for translating XPath queries to SQL expressions. BLAS reduces the number of joins in the SQL query translated from a given XPath query and reduces the number of disk accesses required to execute the SQL query compared with the traditional XPath processing using D-labeling alone. We also propose an approximate P-labeling scheme and the corresponding query translation algorithm to handle XML data trees that contain a large number of distinct tag names, and/or are very deep. This extension captures a spectrum of XPath-to-SQL query translation schemes, ranging from existing schemes that do not use P-labels to the one that uses exact P-labels. Experimental results demonstrate the efficiency of the BLAS system. 相似文献

9.

A methodology for coupling fragments of XPath with structural indexes for XML documents

George H.L. Fletcher Dirk Van Gucht Yuqing Wu Marc Gyssens Sofía Brenes Jan Paredaens 《Information Systems》2009

We introduce a new methodology for coupling language-induced partitions and index -induced partitions on XML documents that is aimed for the benefit of efficient evaluation of XPath queries. In particular, we identify XPath fragments which are ideally coupled with the newly introduced P(k)

P (k)

-partition which has its definition grounded in the well-known A(k)

A (k)

structural index and its associated partition. We then utilize these couplings to investigate fundamental questions about the use of structural indexes in XPath query evaluation. 相似文献

10.

Worst-case optimal algorithm for XPath evaluation over XML streams

Prakash Ramanan 《Journal of Computer and System Sciences》2009,75(8):465-485

We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D; i.e., determine the set Q(D) of document elements selected by Q. We mainly consider Conjunctive XPath queries that involve only the child and descendant axes. Previously known in-memory algorithms for this problem use O(|D|) space and O(|Q||D|) time. Several previously known algorithms for the streaming version use Ω(dn) space and Ω(dn|D|) time in the worst case; d denotes the depth of D, and n denotes the number of location steps in Q. Their exponential space requirement could well exceed the O(|D|) space used by the in-memory algorithms. We present an efficient algorithm that uses O(d|Q|+nc) space and O((|Q|+dn)|D|) time in the worst case; c denotes the maximum number of elements of D that can be candidates for output, at any one instant. For some worst case Q and D, the memory space used by our algorithm matches our lower bound proved in a different paper; so, our algorithm uses optimal memory space in the worst case. 相似文献

11.

Semantic optimization techniques for preference queries

Jan Chomicki 《Information Systems》2007

Preference queries are relational algebra or SQL queries that contain occurrences of the winnow operator (find the most preferred tuples in a given relation). Such queries are parameterized by specific preference relations. Semantic optimization techniques make use of integrity constraints holding in the database. In the context of semantic optimization of preference queries, we identify two fundamental properties: containment of preference relations relative to integrity constraints and satisfaction of order axioms relative to integrity constraints. We show numerous applications of those notions to preference query evaluation and optimization. As integrity constraints, we consider constraint-generating dependencies, a class generalizing functional dependencies. We demonstrate that the problems of containment and satisfaction of order axioms can be captured as specific instances of constraint-generating dependency entailment. This makes it possible to formulate necessary and sufficient conditions for the applicability of our techniques as constraint validity problems. We characterize the computational complexity of such problems. 相似文献

12.

Automatically generating related queries in Japanese

Rosie Jones Kevin Bartz Pero Subasic Benjamin Rey 《Language Resources and Evaluation》2006,40(3-4):219-232

相似文献

13.

Determinacy and query rewriting for conjunctive queries and views

Foto N. Afrati 《Theoretical computer science》2011,412(11):1005-1021

Answering queries using views is the problem which examines how to derive the answers to a query when we only have the answers to a set of views. Constructing rewritings is a widely studied technique to derive those answers. In this paper we consider the problem of the existence of rewritings in the case where the answers to the views uniquely determine the answers to the query. Specifically, we say that a view set Vdetermines a query Q if for any two databases D₁,D₂ it holds: V(D₁)=V(D₂) implies Q(D₁)=Q(D₂). We consider the case where query and views are defined by conjunctive queries and investigate the question: If a view set V determines a query Q, is there an equivalent rewriting of Q using V? We present here interesting cases where there are such rewritings in the language of conjunctive queries. Interestingly, we identify a class of conjunctive queries, CQ_path, for which a view set can produce equivalent rewritings for “almost all” queries which are determined by this view set. We introduce a problem which relates determinacy to query equivalence. We show that there are cases where restricted results can carry over to broader classes of queries. 相似文献

14.

Optimizing XML queries: Bitmapped materialized views vs. indexes

Xiaoying Wu Dimitri Theodoratos Wendy Hui Wang Timos Sellis 《Information Systems》2013

Optimizing queries using materialized views has not been addressed adequately in the context of XML due to the many limitations associated with the definition and usability of materialized views in traditional XML query evaluation models. 相似文献

15.

Memory lower bounds for XPath evaluation over XML streams

Prakash Ramanan 《Journal of Computer and System Sciences》2011,77(6):1120-1140

We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D. We consider two versions of the problem: 1) Filtering Problem: Determine if there is a match for Q in D. 2) Node Selection Problem: Determine the set Q(D) of document nodes selected by Q. We consider Conjunctive XPath (CXPath) queries that involve only the child and descendant axes. Let d denote the depth of D, and n denote the number of location steps in Q. Bar-Yossef et al. (2007, 2005) [6] and [7] presented lower bounds on the memory space required by any algorithm to solve these two problems. Their lower bounds apply to each query in a large subset of XPath, and are obtained (mostly) using nonrecursive(Q,D). In this paper, we present larger lower bounds for a different class of queries (namely, CXPath queries with independent predicates), on recursive(Q,D). One of our results is an Ω(n⋅maxcands(Q,D)) lower bound for the node selection problem, for a worst-case Q; maxcands(Q,D) is the maximum number of nodes of D that can be candidates for output, at any one instant. So, there is no algorithm for the node selection problem that uses O(f(d,|Q|)+maxcands(Q,D)) space, for any function f. This shows that some previously published algorithms are incorrect. 相似文献

16.

Correcting queries for XML

Sara Cohen Tali Brodianskiy 《Information Systems》2009,34(8):690-710

It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query. 相似文献

17.

Genetic algorithms for approximate similarity queries 总被引：1，自引：0，他引：1

Renato Agma J.M. Caetano 《Data & Knowledge Engineering》2007,62(3):459-482

Algorithms to query large sets of simple data (composed of numbers and small character strings) are constructed to retrieve the exact answer, retrieving every relevant element, so the answer said to be exact. Similarity searching over complex data is much more expensive than searching over simple data. Moreover, comparison operations over complex data usually consider features extracted from each element, instead of the elements themselves. Thus, even if an algorithm retrieves an exact answer, it is ‘exact’ regarding the extracted features, not regarding the original elements themselves. Therefore, trading exact answering with query time response can be worthwhile. In this work we developed two search strategies based on genetic algorithms to allow retrieving approximate data indexed by Metric Access Methods (MAM) within a limited, user-defined, amount of time. These strategies allow implementing algorithms to answer both range and k-nearest neighbor queries, and allow also to estimate the precision obtained for the approximate answer. Experimental evaluation shows that very good results (corresponding to what the user would expect) can be obtained in a fraction of the time required to obtain the exact answer. 相似文献

18.

Efficient recursive XML query processing using relational database systems

Sandeep Sourav S. Sanjay 《Data & Knowledge Engineering》2006,58(3):207-242

Recursive queries are quite important in the context of XML databases. In addition, several recent papers have investigated a relational approach to store XML data and there is growing evidence that schema-conscious approaches are a better option than schema-oblivious techniques as far as query performance is concerned. However, the issue of recursive XML queries for such approaches has not been dealt with satisfactorily. In this paper we argue that it is possible to design a schema-oblivious approach that outperforms schema-conscious approaches for certain types of recursive queries. To that end, we propose a novel schema-oblivious approach, called Sucxent++ (Schema Unconcious XML Enabled System), that outperforms existing schema-oblivious approaches such as XParent by up to 15 times and schema-conscious approaches (Shared-Inlining) by up to eight times for recursive query execution. Our approach has up to two times smaller storage requirements compared to existing schema-oblivious approaches and 10% less than schema-conscious techniques. In addition Sucxent++ performs marginally better than Shared-Inlining and is 5.7–47 times faster than XParent as far as insertion time is concerned. 相似文献

19.

Exploiting maximal redundancy to optimize SQL queries

Bin Cao Antonio Badia 《Knowledge and Information Systems》2009,20(2):187-220

Detecting and dealing with redundancy is an ubiquitous problem in query optimization, which manifests itself in many areas of research such as materialized views, multi-query optimization, and query-containment algorithms. In this paper, we focus on the issue of intra-query redundancy, redundancy present within a query. We present a method to detect the maximal redundancy present between a main (outer) query block and a subquery block. We then use the method for query optimization, introducing query plans and a new operator that take full advantage of the redundancy discovered. Our approach can deal with redundancy in a wider spectrum of queries than existing techniques. We show experimental evidence that our approach works under certain conditions, and compares favorably to existing optimization techniques when applicable.

Antonio BadiaEmail:

相似文献

20.

Paged similarity queries

Enzo Seraphim Thatyana F. Piola Seraphim Fábio C.M. Ricotta 《Information Sciences》2011,181(13):2600-2607

An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. 相似文献