期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Indexing XML documents for XPath query processing in external memory

《Data & Knowledge Engineering》2007,60(3):681-699

Existing encoding schemes and index structures proposed for XML query processing primarily target the containment relationship, specifically the parent–child and ancestor–descendant relationship. The presence of preceding-sibling and following-sibling location steps in the XPath specification, which is the de facto query language for XML, makes the horizontal navigation, besides the vertical navigation, among nodes of XML documents a necessity for efficient evaluation of XML queries. Our work enhances the existing range-based and prefix-based encoding schemes such that all structural relationships between XML nodes can be determined from their codes alone. Furthermore, an external-memory index structure based on the traditional B+-tree, XL+-tree(XML Location+-tree), is introduced to index element sets such that all defined location steps in the XPath language, vertical and horizontal, top-down and bottom-up, can be processed efficiently. The XL+-trees under the range or prefix encoding scheme actually share the same structure; but various search operations upon them may be slightly different as a result of the richer information provided by the prefix encoding scheme. Finally, experiments are conducted to validate the efficiency of the XL+-tree approach. We compare the query performance of XL+-tree with that of R-tree, which is capable of handling comprehensive XPath location steps and has been empirically shown to outperform other indexing approaches. 相似文献

2.

Indexing XML documents for XPath query processing in external memory

Qun Andrew Kian Win Jiqing 《Data & Knowledge Engineering》2006,59(3):681-699

Existing encoding schemes and index structures proposed for XML query processing primarily target the containment relationship, specifically the parent–child and ancestor–descendant relationship. The presence of preceding-sibling and following-sibling location steps in the XPath specification, which is the de facto query language for XML, makes the horizontal navigation, besides the vertical navigation, among nodes of XML documents a necessity for efficient evaluation of XML queries. Our work enhances the existing range-based and prefix-based encoding schemes such that all structural relationships between XML nodes can be determined from their codes alone. Furthermore, an external-memory index structure based on the traditional B+-tree, XL+-tree(XML Location+-tree), is introduced to index element sets such that all defined location steps in the XPath language, vertical and horizontal, top-down and bottom-up, can be processed efficiently. The XL+-trees under the range or prefix encoding scheme actually share the same structure; but various search operations upon them may be slightly different as a result of the richer information provided by the prefix encoding scheme. Finally, experiments are conducted to validate the efficiency of the XL+-tree approach. We compare the query performance of XL+-tree with that of R-tree, which is capable of handling comprehensive XPath location steps and has been empirically shown to outperform other indexing approaches. 相似文献

3.

高效查询的XML编码方案 总被引：1，自引：0，他引：1

文华南刘先锋李文锋李玲勇《计算机应用》2010,30(3):831-834

在XML数据查询中,结构连接操作占用了大量时间。针对这个问题,提出一种高效查询的编码方案—LSEQ编码。它将节点路径信息进行分解,避免记录路径的重复信息,减小了编码长度;同时支持节点祖先后代关系,父子关系和兄弟关系的表示。LSEQ编码通过记录非叶节点的路径,在节点查询中避免了结构连接操作,提高了查询效率。实验表明LSEQ编码提高了空间利用率,在查询速度上具有出良好的性能。相似文献

4.

Extending path summary and region encoding for efficient structural query processing in native XML databases

Su-Cheng Haw Author Vitae Chien-Sing Lee Author Vitae 《Journal of Systems and Software》2009,82(6):1025-1035

Optimizing query processing is always a challenging task in the XML database community. Current state-of-the-art approaches focus mainly on simple query. Yet, as the usage of XML shifts towards the data-oriented paradigm, more and more complex query processing needs to be supported. In this paper, we present TwigX-Guide, a hybrid system, which takes advantage of the beautiful features of path summary in DataGuide and region encoding in TwigStack to improve complex query processing. Experimental results indicate that TwigX-Guide can process complex queries on an average 38% better than the TwigStack algorithm, 31% better than TwigINLAB, 11% better than TwigStackList and about 9% better than TwigStackXB in terms of execution time. 相似文献

5.

Indexing and matching multiple-attribute strings for efficient multimedia query processing

Chia-Han Lin Chen A.L.P. 《Multimedia, IEEE Transactions on》2006,8(2):408-411

Multimedia data can be represented as a multiple-attribute string of feature values corresponding to multiple features of the data. Therefore, the retrieval problem can be transformed into the q-attribute string matching problem if q features are considered in a query. A general solution is proposed in this paper. It includes an index structure and the matching methodologies, which can be applied on different values of q. The experiment results show the efficiency of the proposed approach. 相似文献

6.

Indexing graph-structured XML data for efficient structural join operation

《Data & Knowledge Engineering》2006,58(2):159-179

Structural join has been established as a primitive technique for matching the binary containment pattern, specifically the parent–child and ancestor–descendant relationship, on the tree XML data. While current indexing approaches and evaluation algorithms proposed for the structural join operation assume the tree-structured data model, the presence of reference links in XML documents may render the underlying model a graph instead. In the more general category of semi-structured data, of which XML is an example, the data model is also usually supposed to be of graph structure. In this paper, we present an indexing approach and corresponding evaluation algorithms for efficiently performing the structural join operation on graph-structured data. Our approach encodes the structural containment relationship of a graph on multiple nested tree-structured layers, probably with the exception of the last one. With each tree-structured layer indexed with the inverted technique, the structural join operation on a graph can therefore be accomplished through recursively performing structural joins on nested layer trees. Our extensive experiments on both benchmark and synthetic XML data indicate that our proposed approach has good potential to perform significantly better than existing ones in term of both the I/O and CPU cost. 相似文献

7.

RPE query processing and optimization techniques for XML databases 总被引：5，自引：1，他引：4

下载免费PDF全文

Guo-RenWang BingSun Jian-HuaLv GeYu 《计算机科学技术学报》2004,19(2):0-0

An extent join to compute path expressions containing parent-children and ancestor-descendent operations and two path expression optimization rules, path-shortening and path-complementing, are presented in this paper. Path-shortening reduces the number of joins by shortening the path while path-complementing optimizes the path execution by using an equivalent complementary path expression to compute the original one. Experimental results show that the algorithms proposed are more efficient than traditional algorithms. 相似文献

8.

Attribute grammars for scalable query processing on XML streams

Christoph Koch Stefanie Scherzinger 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(3):317-342

We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams (running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness and complexity, and discuss their implementation. 相似文献

9.

Efficiently supporting order in XML query processing

Maged El-Sayed Katica Dimitrova Elke A. Rundensteiner 《Data & Knowledge Engineering》2005,54(3):355-390

XML is an ordered data model and XQuery expressions return results that have a well-defined order. However, little work on how order is supported in XML query processing has been done to date. In this paper we study the issues related to handling order in the XML context, namely challenges imposed by the XML data model, the variety of order requirements of the XQuery language, and the need to maintain order in the presence of updates to the XML data. We propose an efficient solution that addresses all these issues. Our solution is based on a key encoding for XML nodes that serves as node identity and at the same time encodes order. We design rules for encoding order of processed XML nodes based on the XML algebraic query execution model and the node key encoding. These rules do not require any actual sorting for intermediate results during execution. Our approach enables efficient order-sensitive incremental view maintenance as it makes most XML algebra operators distributive with respect to bag union. We prove the correctness of our order encoding approach. Our approach is implemented and integrated with Rainbow, an XML data management system developed at WPI. We have tested the efficiency of our approach using queries that have different order requirements. We have also measured the relative cost of different components related to our order solution in different types of queries. In general the overhead of maintaining order in our approach is very small relative to the query processing time. 相似文献

10.

Efficient access control labeling scheme for secure XML query processing

Dongchan AnAuthor Vitae Seog Park^{Author Vitae} 《Computer Standards & Interfaces》2011,33(5):439-447

We propose an efficient access control labeling scheme for secure query processing under dynamic Extensible Markup Language (XML) data streams. In recent years, XML has become an active research area. In particular, the needs for an efficient and secure query processing method for dynamic XML data in a ubiquitous data stream environment has become very important. The proposed access control labeling scheme supports the efficient processing of dynamic XML data, eliminating the need for re-labeling and secure query processing. Our proposal has the advantage of having an adaptable access control scheme for an existing XML labeling method. 相似文献

11.

Dynamic interval-based labeling scheme for efficient XML query and update processing

Jung-Hee Yun Chin-Wan Chung 《Journal of Systems and Software》2008,81(1):56-70

XML data can be represented by a tree or graph structure and XML query processing requires the information of structural relationships among nodes. The basic structural relationships are parent-child and ancestor-descendant, and finding all occurrences of these basic structural relationships in an XML data is clearly a core operation in XML query processing. Several node labeling schemes have been suggested to support the determination of ancestor-descendant or parent-child structural relationships simply by comparing the labels of nodes. However, the previous node labeling schemes have some disadvantages, such as a large number of nodes that need to be relabeled in the case of an insertion of XML data, huge space requirements for node labels, and inefficient processing of structural joins. In this paper, we propose the nested tree structure that eliminates the disadvantages and takes advantage of the previous node labeling schemes. The nested tree structure makes it possible to use the dynamic interval-based labeling scheme, which supports XML data updates with almost no node relabeling as well as efficient structural join processing. Experimental results show that our approach is efficient in handling updates with the interval-based labeling scheme and also significantly improves the performance of the structural join processing compared with recent methods. 相似文献

12.

Efficient mining of frequent XML query patterns with repeating-siblings

《Information and Software Technology》2008,50(5):375-389

A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2PXMiner that guarantees the discovery of frequent query patterns by scanning the database at most twice. By exploiting a transaction summary data structure, and an enumeration tree, we are able to determine the upper bounds of the frequencies of the candidate patterns, and to quickly prune away the infrequent patterns. We also design an index to trace the repeating candidate subtrees generated by sibling repetition, thus avoiding redundant computations. Experiments results indicate that 2PXMiner is both efficient and scalable. 相似文献

13.

Efficient recursive XML query processing using relational database systems

Sandeep Sourav S. Sanjay 《Data & Knowledge Engineering》2006,58(3):207-242

Recursive queries are quite important in the context of XML databases. In addition, several recent papers have investigated a relational approach to store XML data and there is growing evidence that schema-conscious approaches are a better option than schema-oblivious techniques as far as query performance is concerned. However, the issue of recursive XML queries for such approaches has not been dealt with satisfactorily. In this paper we argue that it is possible to design a schema-oblivious approach that outperforms schema-conscious approaches for certain types of recursive queries. To that end, we propose a novel schema-oblivious approach, called Sucxent++ (Schema Unconcious XML Enabled System), that outperforms existing schema-oblivious approaches such as XParent by up to 15 times and schema-conscious approaches (Shared-Inlining) by up to eight times for recursive query execution. Our approach has up to two times smaller storage requirements compared to existing schema-oblivious approaches and 10% less than schema-conscious techniques. In addition Sucxent++ performs marginally better than Shared-Inlining and is 5.7–47 times faster than XParent as far as insertion time is concerned. 相似文献

14.

Efficient query processing for XML keyword queries based on the IDList index

Junfeng Zhou Zhifeng Bao Wei Wang Jinjia Zhao Xiaofeng Meng 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(1):25-50

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword $k$ consists of ordered nodes that directly or indirectly contain $k$ . We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases. 相似文献

15.

An efficient XML encoding and labeling method for query processing and updating on dynamic XML data

Jun-Ki Min Author Vitae 《Journal of Systems and Software》2009,82(3):503-515

In this paper, we propose an efficient encoding and labeling scheme for XML, called EXEL, which is a variant of the region labeling scheme using ordinal and insert-friendly bit strings. We devise a binary encoding method to generate the ordinal bit strings, and an algorithm to make a new bit string inserted between bit strings without any influences on the order of preexisting bit strings. These binary encoding method and bit string insertion algorithm are the bases of the efficient query processing and the complete avoidance of re-labeling for updates. We present query processing and update processing methods based on EXEL. In addition, the Stack-Tree-Desc algorithm is used for an efficient structural join, and the String B-tree indexing is utilized to improve the join performance. Finally, the experimental results show that EXEL enables complete avoidance of re-labeling for updates while providing fairly reasonable query processing performance. 相似文献

16.

一种基于DTD的XML索引方法 总被引：9，自引：0，他引：9

路燕张亮段起阳施伯乐《计算机研究与发展》2005,42(1):30-37

路径查询是XML查询的一个主要特征,现已提出了多种XML索引方法. DTD的结构信息对于XML索引的建立及查询效率的提高很重要,但现有的大部分索引方法没有利用DTD这一有效资源.提出一种利用DTD的XML索引方法——DBXI(DTD-based XML indexing),该方法采用了新的编码方法,可使路径查询具备如下特征：对于由N个元素/属性组成的具有1个谓词约束的路径表达式,DBXI处理每个XML文档仅需0次或1次元素/属性结点集的结构连接操作;对于在XML文档中不存在匹配结构的路径查询,DBXI能够在比现有的XML索引方法较短的时间内给出无查询结果的判断.实验表明,与Lore,SphinX和XISS等索引方法相比,DBXI能够缩短路径查询的响应时间. 相似文献

17.

A query language for XML

《Computer Networks》1999,31(11-16):1155-1169

An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation, and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative, `relational complete' query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents. 相似文献

18.

Secure query processing against encrypted XML data using Query-Aware Decryption

Jae-Gil Lee Kyu-Young Whang 《Information Sciences》2006,176(13):1928-1947

Dissemination of XML data on the internet could breach the privacy of data providers unless access to the disseminated XML data is carefully controlled. Recently, the methods using encryption have been proposed for such access control. However, in these methods, the performance of processing queries has not been addressed. A query processor cannot identify the contents of encrypted XML data unless the data are decrypted. This limitation incurs overhead of decrypting the parts of the XML data that would not contribute to the query result. In this paper, we propose the notion of Query-Aware Decryption for efficient processing of queries against encrypted XML data. Query-Aware Decryption allows us to decrypt only those parts that would contribute to the query result. For this purpose, we disseminate an encrypted XML index along with the encrypted XML data. This index, when decrypted, informs us where the query results are located in the encrypted XML data, thus preventing unnecessary decryption for other parts of the data. Since the size of this index is much smaller than that of the encrypted XML data, the cost of decrypting this index is negligible compared with that for unnecessary decryption of the data itself. The experimental results show that our method improves the performance of query processing by up to six times compared with those of existing methods. Finally, we formally prove that dissemination of the encrypted XML index does not compromise security. 相似文献

19.

Scaling XML query processing: distribution, localization and pruning

Patrick Kling M. Tamer ?zsu Khuzaima Daudjee 《Distributed and Parallel Databases》2011,29(5-6):445-490

Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution. 相似文献

20.

Temporal XML: modeling, indexing, and query processing 总被引：1，自引：0，他引：1

Flavio Rizzolo Alejandro A. Vaisman 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1179-1212

In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements. 相似文献