首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
目前大部分XML查询语言都使用树模式来匹配待查询的XML文档树以得到所需要的、与模式树相吻合的查询结果,此效率在很大程度上取决于XML模式树的大小,那么尽可能快速地查找并删除查询模式树中的冗余节点就变得十分重要。重点讨论DTD约束下树模式的最小化问题,将DTD兄弟约束SC拓展成扩展兄弟约束ESC,使其能够表达DTD约束中的祖先-后代关系;并指出只包含{ESC,/,//,[],*}的查询树模式的最小化问题的复杂度是指数级的,且当模式树是分支受限的时候,其最小化问题的复杂度是多项式时间的;最后给出了一个多项式时间的受限分支的模式树最小化算法。  相似文献   

2.
One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query reformulation, knowledge-base verification, information integration, integrity checking, and cooperative answering. Containment is undecidable in general for Datalog, the fundamental language for expressing recursive queries. On the other hand, it is known that containment between monadic Datalog queries and between Datalog queries and unions of conjunctive queries are decidable. It is also known that containment between unions of conjunctive two-way regular path queries, which are queries used in the context of semistructured data models containing a limited form of recursion in the form of transitive closure, is decidable. In this paper, we combine the automata-theoretic techniques at the base of these two decidability results to show that containment of Datalog in union of conjunctive two-way regular path queries is decidable in 2EXPTIME. By sharpening a known lower bound result for containment of Datalog in union of conjunctive queries we show also a matching lower bound.  相似文献   

3.
It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query.  相似文献   

4.
李锐  吴开贵 《计算机应用》2009,29(3):854-857
查询重写是数据集成的一个关键问题,它是将用户的查询请求自动重写为直接面向数据源的查询请求。最近Michigan大学和IBM的AImaden研究中心提出了一种新的基于约束的XML查询重写算法,但是该算法没有考虑复杂模式匹配重写问题,使得该算法应用受到限制。在原来的算法重写思想基础上,提出了一种改进的XML查询重写算法,扩大原算法的应用范围,并分析了改进算法的正确性和时间复杂度。  相似文献   

5.
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users?? information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics  相似文献   

6.
在XML文档上进行全文本检索已经成为很多研究课题的基础问题,例如Web信息检索,信息抽取等。有效的XML索引结构对于加速检索速度是至关重要的,在文献[1]的基础上全面地构建和实现了一个可以有效的支持XML全文本检索的索引结构。实验表明提出的索引结构在索引构建时间、空间等性能指标上均有很好的表现。  相似文献   

7.
We describe the Enosys XML integration platform, focusing on the query language, algebra, and architecture of its query processor. The platform enables the development of eBusiness applications in customer relationship management, e-commerce, supply chain management, and decision support. These applications often require that data be integrated dynamically from multiple information sources. The Enosys platform allows one to build (virtual and/or materialized) integrated XML views of multiple sources, using XML queries as view definitions. During run-time, the application issues XML queries against the views. Queries and views are translated into the XCQL algebra and are combined into a single algebra expression/plan. Query plan composition and query plan decomposition challenges are faced in this process. Finally, the query processor lazily evaluates the result, using an appropriate adaptation of relational database iterator models to XML. The paper describes the platform architecture and components, the supported XML query language and the query processor architecture. It focuses on the underlying XML query algebra, which differs from the algebras that have been considered by W3C in that it is particularly tuned to semistructured data and to optimization and efficient evaluation in a system that follows the conventional architecture of database systems.  相似文献   

8.
The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path queries. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose two structured signature methods for XML documents. Through experiments, we evaluate the performance of the proposed methods.  相似文献   

9.
In this paper, we focus on efficient construction of restricted subtree (RSubtree) results for XML keyword queries on amulticore system. We firstly show that the performance bottlenecks for existing methods lie in 1) computing the set of relevant keyword nodes (RKNs) for each subtree root node, 2) constructing the corresponding RSubtree, and 3) parallel execution. We then propose a two-step generic top-down subtree construction algorithm, which computes SLCA/ELCA nodes in the first step, and parallelly gets RKNs and generates RSubtree results in the second step, where genericmeans that 1) our method can be used to compute different kinds of subtree results, 2) our method is independent of the query semantics; top-down means that our method constructs each RSubtree by visiting nodes of the subtree constructed based on an RKN set level-by-level from left to right, such that to avoid visiting as many useless nodes as possible. The experimental results show that our method is much more efficient than existing ones according to various metrics.  相似文献   

10.
《Artificial Intelligence》2006,170(8-9):779-801
In this paper we propose a form of reasoning on specifications of combinatorial problems, with the goal of reformulating them so that they are more efficiently solvable. The reformulation technique highlights constraints that can be safely “delayed”, and solved afterwards. Our main contribution is the characterization (with soundness proof) of safe-delay constraints with respect to a criterion on the specification, thus obtaining a mechanism for the automated reformulation of specifications applicable to a great variety of problems, e.g., graph coloring, bin-packing, and job-shop scheduling. This is an advancement with respect to the forms of reasoning done by state-of-the-art-systems, which typically just detect linearity of specifications. Another contribution is an experimentation on the effectiveness of the proposed technique using six different solvers, which reveals promising time savings.  相似文献   

11.
We present a technique for refining the design of relational storage for XML data. The technique is based on XML key propagation: given a set of keys on XML data and a mapping (transformation) from the XML data to relations, what functional dependencies must hold on the relations produced by the mapping? With the functional dependencies one can then convert the relational design into, e.g. 3NF, BCNF, and thus develop efficient relational storage for XML data. We provide several algorithms for computing XML key propagation. One algorithm is to check whether a functional dependency is propagated from a set of XML keys via a predefined mapping; this allows one to determine whether or not the relational design is in a normal form. The others are to compute a minimum cover for all functional dependencies that are propagated from a set of XML keys and hold on a universal relation; these provide guidance for how to design a relational schema for storing XML data. These algorithms show that XML key propagation and its associated minimum cover can be computed in polynomial time. Our experimental results verify that these algorithms are efficient in practice. We also investigate the complexity of propagating other XML constraints to relations. The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level.  相似文献   

12.
We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint implication, we also consider an instance-based approach in which we take into account data. More precisely, we study implication with respect to a current tree instance, resulting from a series of unknown updates. The main motivation of our work is reasoning about data integrity under update restrictions in contexts where owners may lose control over their data, such as in publishing or exchange.  相似文献   

13.
Boundaries occur naturally in everyday life. This paper introduces numerical constraints into the framework of XML to take advantage of the benefits that result from the explicit specification of such boundaries. Roughly speaking, numerical constraints restrict the number of elements in an XML data fragment based on the data values of selected subelements. Efficient reasoning about numerical constraints provides effective means for predicting the number of answers to XQuery and XPath queries, the number of updates when using the XQuery update facility, and the number of encryptions or decryptions when using XML encryption. Moreover, numerical constraints can help to optimise XQuery and XPath queries, to exclude certain choices of indices from the index selection problem, and to generate views for efficient processing of common queries and updates.We investigate decision problems associated with numerical constraints in order to capitalise on the range of applications in XML data processing. To begin with we demonstrate that the implication problem is strongly coNP-hard for several classes of numerical constraints. These sources of potential intractability direct our attention towards the class of numerical keys that permit the specification of positive upper bounds. Numerical keys are of interest as they are reminiscent of cardinality constraints that are widely used in conceptual data modelling. At the same time, they form a natural generalisation of XML keys that are popular in XML theory and practice. We show that numerical keys are finitely satisfiable and establish a finite axiomatisation for their implication problem. Finally, we propose an algorithm that decides numerical key implication in quadratic time using shortest path methods.  相似文献   

14.
The streaming evaluation is a popular way of evaluating queries on XML documents. Besides its many advantages, it is also the only option for a number of important XML applications. Unfortunately, existing algorithms focus almost exclusively on tree-pattern queries (TPQs). Requirements for flexible querying of XML data have motivated recently the introduction of query languages that are more general and flexible than TPQs. These languages are not supported by existing algorithms. In this paper, we consider a partial tree-pattern query (PTPQ) language which generalizes and strictly contains TPQs. PTPQs can express a fragment of XPath which comprises reverse axes and the node identity equality (is) operator, in addition to forward axes, wildcards and predicates. They constitute an important subclass of XPath, which is very useful in practice. Unfortunately, previous streaming algorithms for TPQs cannot be applied to PTPQs. PTPQs can be represented as dags enhanced with constraints. We explore this representation to design an original polynomial time streaming algorithm for PTPQs. Our algorithm aggressively filters incoming data that is irrelevant to the query and wisely avoids processing redundant query matches (i.e., matches of the query dag that do not contribute to new solutions). Our algorithm is the first one to support the streaming evaluation of such a broad fragment of XPath. We provide an analysis of it, and conduct an extensive experimental evaluation of its performance and scalability. Compared to the only known streaming algorithm that supports TPQs extended with reverse axes, our algorithm performs better by orders of magnitude while consuming a much smaller fraction of memory space. Current streaming applications have stringent requirements on query response time and memory consumption because of the large (possibly unbounded) size of data they handle. In order to keep memory usage and CPU consumption low for the PTPQ streaming evaluation, we design another streaming algorithm called Eager PSX for PTPQs. Its key feature is that it applies an eager evaluation strategy to quickly determine when node matches should be returned as solutions to the user and also to proactively detect redundant matches. We theoretically analyze Eager PSX, and experimentally test its time and space performance and scalability. We compare it with PSX. Our results show that Eager PSX not only achieves better space performance without compromising time performance, but it also greatly improves query response time for both simple and complex queries, in many cases, by orders of magnitude.  相似文献   

15.
16.
Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.  相似文献   

17.
18.
The objective is to achieve the constrained containment of discrete-time multiagent systems with external disturbances in directed networks. In contrast to the existing constrained containment results that ignore disturbances, this work incorporates the disturbances, the position, and velocity constraints of followers simultaneously. A projection-based control protocol is developed to pull all followers into a target area formed by some leaders, and meanwhile, a certain level of disturbance attenuation performance is satisfied. By utilizing the multiple model transformations, convexity analysis, and the Lyapunov approach, a sufficient condition is derived to realize the constrained containment with H $$ {H}_{\infty } $$ performance. Some simulation results are provided to demonstrate the disturbance-rejection performance of the proposed control method.  相似文献   

19.
每一个复杂的Twig查询都由线性Twig查询构成,有效地处理线性Twig查询显得非常重要。DM XML系统以国产DM5.6关系数据库为平台,融合结构映射和模型映射,实现独特的路径分区编码方案来存储XML数据。在系统中,线性Twig查询解析后,形成线性Twig查询的路径集,而该集合中的每一个路径可被唯一变换为关系数据库中整型主键的范围查询。实验结果显示,路径分区编码方案能加速线性Twig查询,它将为高效实现复杂Twig查询奠定基础。  相似文献   

20.
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号