首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries.  相似文献   

A common problem of XML query algorithms is that execution time and input size grows rapidly as the size of XML document increases. In this paper, we propose a version-labeling scheme and TwigVersion algorithm to address this problem. The version-labeling scheme is utilized to identify all repetitive structures in XML documents, and the Version Tree is constructed to hold such version information. To process a query, TwigVersion generates a filter through the created Version Tree, and the final answer to the query can be retrieved from the database easily through the filtering process. Both theoretical proof and experimental results reported in this paper demonstrate that the concise structure of Version Tree and the reduced input size make TwigVersion outperform the existing approaches.  相似文献   

We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams (running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness and complexity, and discuss their implementation.  相似文献   

We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D; i.e., determine the set Q(D) of document elements selected by Q. We mainly consider Conjunctive XPath queries that involve only the child and descendant axes. Previously known in-memory algorithms for this problem use O(|D|) space and O(|Q||D|) time. Several previously known algorithms for the streaming version use Ω(dn) space and Ω(dn|D|) time in the worst case; d denotes the depth of D, and n denotes the number of location steps in Q. Their exponential space requirement could well exceed the O(|D|) space used by the in-memory algorithms. We present an efficient algorithm that uses O(d|Q|+nc) space and O((|Q|+dn)|D|) time in the worst case; c denotes the maximum number of elements of D that can be candidates for output, at any one instant. For some worst case Q and D, the memory space used by our algorithm matches our lower bound proved in a different paper; so, our algorithm uses optimal memory space in the worst case.  相似文献   

RRSi: indexing XML data for proximity twig queries   总被引:2,自引:2,他引:0  
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:

This paper describes a syntactic method for representing the primitive parts of a pattern as nodes of a type of directed graph. A linear representation of the digraph can then be presented to a regular unordered tree automaton for classification. Regular unordered tree automata can be simulated by deterministic pushdown automata, so this procedure can be implemented easily. Regular u-tree automata and the corresponding generative systems, regular u-tree grammars are formally defined. Several results are shown which are applicable to all syntactic pattern recognition schemes involving the use of primitives.  相似文献   

We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D. We consider two versions of the problem: 1) Filtering Problem: Determine if there is a match for Q in D. 2) Node Selection Problem: Determine the set Q(D) of document nodes selected by Q. We consider Conjunctive XPath (CXPath) queries that involve only the child and descendant axes. Let d denote the depth of D, and n denote the number of location steps in Q. Bar-Yossef et al. (2007, 2005) [6] and [7] presented lower bounds on the memory space required by any algorithm to solve these two problems. Their lower bounds apply to each query in a large subset of XPath, and are obtained (mostly) using nonrecursive(Q,D). In this paper, we present larger lower bounds for a different class of queries (namely, CXPath queries with independent predicates), on recursive(Q,D). One of our results is an Ω(nmaxcands(Q,D)) lower bound for the node selection problem, for a worst-case Q; maxcands(Q,D) is the maximum number of nodes of D that can be candidates for output, at any one instant. So, there is no algorithm for the node selection problem that uses O(f(d,|Q|)+maxcands(Q,D)) space, for any function f. This shows that some previously published algorithms are incorrect.  相似文献   

Matching twigs in fuzzy XML   总被引:2,自引:0,他引:2  
A considerable amount of twig pattern matching algorithms have been proposed to holistically process a twig query. Those algorithms mainly focus on twig pattern query with the AND-logic. However, there is often a need to process a twig query with the OR-predicates. Furthermore, the existing algorithms fall short in their ability to support twig query with OR-logic in fuzzy XML. To overcome this limitation, in this paper, we first introduce a novel encoding scheme to represent node information in fuzzy XML. Based on the encoding scheme, we then propose an effective algorithm for matching a twig pattern query with the AND/OR-logic in fuzzy XML. Our approach adopts a compact stack technique to process the complicated twig query consisting of both AND-logic and OR-logic. More importantly, our method eliminates re-scanning unnecessary portions of XML documents and redundant intermediate results. Finally, the experimental results demonstrate the performance advantages of our approach.  相似文献   

针对目前不确定XML小枝模式查询处理方法中繁复的结构连接操作和不便于概率阈值过滤的缺点,提出一种新颖的基于序列的不确定XML小枝模式查询处理方法。该方法包括不确定XML序列索引的建立以及基于序列匹配的查询算法,与现有的不确定XML查询处理方法相比不需要繁复的结构连接操作,而且可以灵活地进行三次概率阈值过滤。理论分析和实验表明,该方法便于概率阈值过滤,同时具有较高的查询效率。  相似文献   

有效的索引技术是加速XML查询的重要因素.目前已有很多基于记录类和结构类的索引技术,但它们处理同时具有Twig和数据内容的查询时效果都不是很好.而提出的RD-IL索引技术能有效解决此类查询,同时也能有效解决涉及到以下几方面的查询:Twig、数据内容、ancestor-descendant.  相似文献   

While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach.  相似文献   

The security of published XML data receives exceptional attention due to its sensitive nature in many applications. This paper proposes an XML view publishing method called XFlat. Compared with other methods, XFlat focuses on query performance over the published XML view while simultaneously protecting the sensitive data via encryption techniques. XFlat decomposes an XML tree into a set of sub-trees, in each of which multiple users have the same accessibility to all nodes, and may encrypt and store each sub-tree in a flat, sequential manner. This storage strategy can avoid the nested encryption cost in view construction and the nested decryption cost in query evaluation. In addition, we discuss how to generate a user-specific schema and how to minimize the total space cost of the published XML view when considering the overhead of the relationships among the sub-trees. We also propose an XML schema index to enhance query performance over the final XML view. The experimental results demonstrate the effectiveness and efficiency of the proposed XFlat method.  相似文献   

XML流数据查询结果的缓存管理   总被引:2,自引:0,他引:2  
杨卫东  王清明  施伯乐 《软件学报》2008,19(8):2080-2088
提出一种系统地处理XML数据流的返回结果集的方法.在该方法中,用户对数据的兴趣用XQuery表示,能够处理递归文档以及同时处理多个查询;通过运行时栈驱动的基于二进制的前缀编码,在运行时确定结果集中节点之间的关系,避免了大量结果集之间的连接操作,能够有效减少内存耗费,提高处理性能.  相似文献   

Indexing and querying XML using extended Dewey labeling scheme   总被引:1,自引:0,他引:1  
Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.  相似文献   

目前已经提出了多种查询XML数据的方法,然而这些传统的方法不能充分利用多处理器和多核心处理器的优势。本文提出了一种XML查询的并行算法,大幅提高了共享存储器多处理器、多核心处理器系统中XML数据的查询效率。  相似文献   

张胜  舒坚  包晓玲 《计算机应用》2008,28(10):2537-2540
XML已经成为互联网上信息交换和信息表示的事实标准。然而XML文档中包含大量重复出现的标签和结构等冗余信息,导致XML文档在查询处理和数据交换时付出更高的代价,特别在带宽和资源受限的设备上显得更为突出。压缩技术是解决这一问题的重要途径。搜集了近几年提出的各种XML压缩方法,从压缩率、压缩与解压时间、内存消耗、查询性能等方面比较分析了六个具有代表性的XML压缩技术,最后简要归纳了各自的优点和存在的不足,并探讨未来努力的方向。  相似文献   

XML为在Web上发布和交换异质数据提供了相当的灵活性。但由于这种语言自身就有冗余的特性所以XML文档在体积都大于有相同数据内容的其他类型的文档。随着XML在Web上应用的扩展其数据大小自然也会随之而增加,而这实质上增加了数据的存储量、处理量和交换量,因此XML文档的体积问题也阻碍了XML的应用,特别是阻碍了XML在具有带宽和内存容量限制的应用上如移动通讯应用。在这篇文章中,我们将大致的介绍一下最近提出的几种针对XML的压缩算法并分析它们在解决XML文档体积问题上的技术和效能。  相似文献   

Comparative Analysis of XML Compression Technologies   总被引:1,自引:0,他引:1  
XML provides flexibility in publishing and exchanging heterogeneous data on the Web. However, the language is by nature verbose and thus XML documents are usually larger in size than other specifications containing the same data content. It is natural to expect that the data size will continue to grow as XML data proliferates on the Web. The size problem of XML documents hinders the applications of XML, since it substantially increases the costs of storing, processing and exchanging the data. The hindrance is more apparent in bandwidth- and memory-limited settings such as those applications related to mobile communication. In this paper, we survey a range of recently proposed XML specific compression technologies and study their efforts and capabilities to overcome the size problem. First, by categorizing XML compression technologies into queriable and unqueriable compressors, we explain the efforts in the representative technologies that aim at utilizing the exposed structure information from the input XML documents. Second, we discuss the importance of queriable XML compressors and assess whether the compressed XML documents generated from these technologies are able to support direct querying on XML data. Finally, we present a comparative analysis of the state-of-the-art XML conscious compression technologies in terms of compression ratio, compression and decompression times, memory consumption, and query performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号