首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于关系数据库有效地实现RPE查询   总被引:5,自引:1,他引:5  
各种XML查询语言的共同特点就是利用正则路径表达式(RPE)来导航XML文档的查询。本文结合我们提出的一种新的XML数据的关系存储模式,对有效地实现RPE查询的相关研究工作进行了总结,并提出了两个有效地实现包含连接的索引改进归并连接算法。算法采用索引定位技术、短路技术和预侦技术来减少连接代价。因此,不仅能够在当前上下文计算环境下有效地实现包含连接的计算,而且能够大量地避免包含连接中不必要的扫描和搜索。  相似文献   

2.
有效支持XML结构化连接的索引——CATI   总被引:1,自引:0,他引:1  
结构化连接的效率直接影响着XML查询的性能,目前对XML的结构化连接大多都是基于编码的方法.介绍了一种全新的有效支持XML结构化连接的树索引CATI(compact ancestor tree index)CATI的基本思想是,对于给定的一个祖先后代查询(A-D查询)或Twig查询,遍历XML文档,找出所有的祖先A的实例,用以建立CATI的主干;对于每个A实例,找出它的直接后代D的实例链接在它的后面.因为经典的结构连接算法Stack-Tree算法效率较高且使用较广,因此应用基于CATI的结构连接算法和基于Stack-Tree的结构连接算法就A-D查询和Twig查询做了大量实验.实验结果表明,基于CATI的结构化连接在一般查询情况下性能明显优于基于Stack-Tree的结构化连接.  相似文献   

3.
Searching for occurrences of a twig pattern query (TPQ) in an XML document is a core task of all XML database query languages. The generalized twig pattern (GTP) extends the TPQ model to include semantics related to output nodes, optional nodes, and boolean expressions which are part of the XQuery language. Preorder filtering holistic algorithms such as TwigStack represent a significant class of TPQ processing approaches with a linear worst-case I/O complexity with respect to the sum of the input and output sizes for some query classes. Another important class of holistic approaches is represented by postorder filtering holistic algorithms such as $\text{ Twig}^2$ Stack which introduced a linear output enumeration time with respect to the result size. In this article, we introduce a holistic algorithm called GTPStack which is the first approach capable of processing a GTP with a linear worst-case I/O complexity with respect to the GTP result size. This is achieved by using a combination of the preorder and postorder filterings before storing nodes in an intermediate storage. Additionally, another contribution of this article is an introduction of a new perspective of holistic algorithm optimality. We show that the optimality depends not only on a query class but also on XML document characteristics. This new view on the optimality extends the general knowledge about the type of queries for which the holistic algorithms are optimal. Moreover, it allows us to determine that GTPStack is optimal for any GTP when a specific XML document is considered. We present a comprehensive experimental study of the state-of-the-art holistic algorithms showing under which conditions GTPStack outperforms the other holistic approaches.  相似文献   

4.
Branch query processing is a core operation of XML query processing. In recent years, a number of stack based twig join algorithms have been proposed to process twig queries based on tag stream index. However, in tag stream index, each element is labeled separately without considering the similarity among elements. Besides, algorithms based on tag stream index perform inefficiently on large document. This paper proposes a novel index, named Clustered Chain Path Index, based on a novel labeling scheme. This index provides efficient support for processing branch queries. It also has the same cardinality as 1-index against tree structured XML document. Based on CCPI, efficient algorithms, KMP-Match-Path and Related-Path-Segment-Join, are proposed to process queries efficiently. Analysis and experimental results show that proposed query processing algorithms based on CCPI outperform other algorithms and have good scalability. This paper is partially supported by Natural Science Foundation of Heilongjiang Province, Grant No. zjg03-05 and National Natural Science Foundation of China, Grant No. 60473075 and Key Program of the National Natural Science Foundation of China, Grant No. 60533110.  相似文献   

5.
A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach.  相似文献   

6.
Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.  相似文献   

7.
Indexing and querying XML using extended Dewey labeling scheme   总被引:1,自引:0,他引:1  
Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.  相似文献   

8.
路径分区编码优化小枝查询   总被引:1,自引:1,他引:0  
徐小双  冯玉才  王锋  周英飚  张俊 《计算机科学》2010,37(3):182-187204
有效地存储查询XML文档已经成为当今数据库领域的研究热点。从XML文档的路径统计出发,提出了路径分区存储编码方案,并依此消除了小枝查询的后裔边和通配符。针对这类不含//和*的小枝查询,利用路径分区编码的特性,给出了基于结构约束节点的Twig查询算法,极大地减少了结构连接次数。实验表明,该算法能有效滤除无关元素,提高小枝查询效率。  相似文献   

9.
一种高效的XML路径查询索引   总被引:1,自引:0,他引:1       下载免费PDF全文
XML文档的查询索引是当前研究的热点。提出一种高效的XML路径查询索引KDXI,首先对XML文档进行编码,然后建立结构索引并对结构索引进行编码。研究了基于KDXI索引结构的半结构连接算法和路径查询处理过程。通过KDXI索引机制,可以有效执行一般的路径查询语句,并避免冗余的结构连接操作。实验证明了KDXI索引机制的优越性。  相似文献   

10.
11.
We investigate the limitations of existing XML search methods and propose a new semantics, related relationship, to effectively capture meaningful relationships of data elements from XML data in the absence of structural constraints. Then we make an extension to XPath by introducing a new axis, related axis, to specify the related relationship between query nodes so as to enhance the flexibility of XPath. We propose to reduce the cost of computing the related relationship by a new schema summary that summarizes the related relationship from the original schema without any loss. Based on this schema summary, we introduce two indices to improve the performance of query processing. Our algorithm shows that the evaluation of most queries can be equivalently transformed into just a few selection and value join operations, thus avoids the costly structural join operations. The experimental results show that our method is effective and efficient in terms of comparing the effectiveness of the related relationship with existing keyword search semantics and comparing the efficiency of our evaluation methods with existing query engines.  相似文献   

12.
XPath是XML的基本查询语言,XPath查询最小化对于提高XML数据库的查询性能具有重要意义.但是,由于XPath查询最小化是一个coNP完备问题,大部分已有的算法局限于处理简单的XPath片段.本文从一个新的角度入手,综合考虑完备性和高效性,提出了一个新的查询最小化框架,与已有算法"面向结点",即逐个删除冗余结点的解决思路不同,本文提出"面向树模式"的方式,即通过计算树模式的自同态映射,寻找目标结点集最小的自同态映射,进而求解最小等价查询树的方法.该方法具有较高的效率,而且在--Z..情况下是完备的,尤其是可以进一步扩展到更复杂的XPath片段.本文以此框架为基础,给出一个可以计算复杂查询模式的算法.  相似文献   

13.
When data sources are virtually integrated, there is no common and centralized method to maintain global consistency, so inconsistencies with regard to global integrity constraints are very likely to occur. In this paper, we consider the problem of defining and computing consistent query answers when queries are posed to virtual XML data integration systems, which are specified following the local-as-view approach. We propose a powerful XML constraint model to define global constraints, which can express keys and functional dependencies, and which also extends the newly introduced conditional functional dependencies to XML. We provide an approach to defining XML views, which supports not only edge-path mappings but also data-value bindings to express the join operator. We give formal definitions of repair and consistent query answers with the XML data integration settings. Given a query on the global system, we present a two-step method to compute consistent query answers. First, the given query is transformed using the global constraints, such that to run the transformed query on the original global system will generate exactly the consistent query answers. Because the global instance is not materialized, the query on the global instance is then rewritten in the form of queries on the underlying data sources by reversing rules in view definitions. We illustrate that the XPath query transformations can be implemented in XQuery. Finally, we implement prototypes of our method and evaluate our algorithms in the experiments.  相似文献   

14.
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.  相似文献   

15.
We present BLAS, a Bi-LAbeling based XPath processing System. BLAS uses two labeling schemes to speed up query processing: P-labeling for processing consecutive child (or parent) axis traversals, and D-labeling for processing descendant (or ancestor) axis traversals. XML data are stored in labeled form and indexed. Algorithms are presented for translating XPath queries to SQL expressions. BLAS reduces the number of joins in the SQL query translated from a given XPath query and reduces the number of disk accesses required to execute the SQL query compared with the traditional XPath processing using D-labeling alone. We also propose an approximate P-labeling scheme and the corresponding query translation algorithm to handle XML data trees that contain a large number of distinct tag names, and/or are very deep. This extension captures a spectrum of XPath-to-SQL query translation schemes, ranging from existing schemes that do not use P-labels to the one that uses exact P-labels. Experimental results demonstrate the efficiency of the BLAS system.  相似文献   

16.
随着半结构化的数据在信息交换中越来越重要,近年来,在XML数据库中,研究工作者提出了很多匹配小枝查询的算法.这些算法对仅含祖先后裔边的查询是很有效的,但是当查询中同时含祖先后裔和父子边时,以前算法仍可能产生大量中间结果,尤其是输入和输出的规模很大时.为避免中间结果的产生,提出了一种新的算法OPTwig,它是基于有序对的,通过查询树和文档树中结点有序对的匹配来进行查询,且不需要进行归并操作.结果表明,该算法优于以前算法.  相似文献   

17.
一种高效非归并的XML 小枝模式匹配算法   总被引:2,自引:0,他引:2  
陶世群  富丽贞 《软件学报》2009,20(4):795-803
在XML 数据库中,小枝模式查询是XML 查询处理的核心操作.近几年,研究人员已提出许多种算法,如 Holistic Twig 和TJFast 算法等.然而它们都是基于归并的,会有很高的计算代价.已提出的Twig2Stack 和TwigList 算法虽然可以克服这一点,但算法非常复杂.针对这一问题,尤其是考虑了通常查询表达式中只有少数几个结点是最终的输出结点这一特点,提出了TiwgNM 算法及其扩展算法TiwgNME 算法.算法不需要归并,且只用了少数栈来实现.实验结果表明,这些算法优于以前算法,尤其是对查询中只有祖先-后裔关系的表达式更有效.  相似文献   

18.
The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.  相似文献   

19.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.  相似文献   

20.
在使用"不完全结构的约束查询(PSTP查询)"从XML文档中获取信息时,用户可以根据自身对XML文档结构的熟悉程度,在查询表达式中灵活地嵌入结构约束条件,从而满足完全不了解、完全了解及了解部分结构信息的各种用户的查询需求。提出一种基于扩展Dewey编码的查询处理算法,可以在仅扫描一遍元素的情况下,处理任意形式的PSTP查询。不同数据集上的实验结果表明,EDPS算法在处理twig查询、不包含"*"结点的PSTP查询及包含"*"结点的PSTP查询时,综合性能明显优于已有方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号