首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 250 毫秒
XML数据扩展前序编码的更新方法   总被引:15,自引:0,他引:15       下载免费PDF全文
罗道锋  孟小峰  蒋瑜 《软件学报》2005,16(5):810-818
大部分XML查询技术都是基于某种对XML树的编码方法.对XML树的编码,是指按照某种规则对XML树的每一个结点分配唯一的编码,目的是通过任意两个结点的编码,能够直接判断两个结点之间是否具有祖先后代关系.最常用的编码方法是区域编码方法(region based numbering scheme).然而,XML数据也会面临插入删除等更新问题.数据一旦更新,区域编码也要作相应的调整,才能保证基于这个编码的各种索引和查询算法的正确性.在编码的更新方面,目前研究得还不多.主要研究区域编码的更新问题,采用预留编码空间的方法,针对不同特征的XML数据和应用环境提出了一整套预留算法和编码更新算法,并做了大量的实验,检验这些算法的有效性.  相似文献   

基于扩展编码的在线XML文档加载机制   总被引:1,自引:0,他引:1  
Webservices应用中存在大量在线XML文档处理的需求,利用现行的XML数据处理方法来处理上述在线文档是一项可行方案,在线文档的加载问题应运而生;目前对XML数据的存储和查询都是基于对XML文档树的某种编码方法,使用扩展编码方法以提高文档更新性能;如何基于扩展编码方法完成对在线文档加载的研究目前还比较少.提出一种新的扩展编码方法,在此编码的基础上,提出一种适合在线XML文档的加载方法,通过统计同模式的XML文档特征和更新特征,一遍解析文档完成扩展编码和加载;实验结果表明,加载效率和更新性能较好.  相似文献   

任家东  尹晓鹏 《计算机工程》2006,32(18):79-80,8
为了提高查询效率,许多XML文档编码方案相继被提出。目前大部分编码方案并不能很好地支持文档更新。在分析比较现有编码方案的基础上,提出了一种新的动态编码方案(DNS)。该方案用实数表示XML文档树中的节点编码,能够利用连续数值间的区域为新插入的节点或子树编码,并能够根据文档的更新情况动态调整部分节点的编码。  相似文献   

In order to facilitate the XML query processing, several labeling schemes have been proposed to directly determine the structural relationships between two arbitrary XML nodes without accessing the original XML documents. However, the existing XML labeling schemes have to re-label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during an update process. In this paper, we devise a novel encoding scheme based on the fractional number to encode the labels of the XML nodes. Moreover, we propose a mapping method to convert our proposed fractional number based encoding scheme to bit string based encoding scheme with the intention to minimize the label size and save the storage space. By applying our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme, the process of re-labeling the pre-existing nodes can be avoided when nodes are inserted as leaf nodes and sibling nodes without affecting the order of XML nodes. In addition, we propose an algorithm to control the increment of label size when new nodes are inserted frequently at a fix place of an XML tree. Experimental results show that our proposed bit string encoding scheme provides efficient support to the process of XML updating without sacrificing the query performance when it is applied to the range-based labeling schemes.  相似文献   

Indexing and querying XML using extended Dewey labeling scheme   总被引:1,自引:0,他引:1  
Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.  相似文献   

In view of the efficiency requirements for query and update processing in XML databases, implementation of the robust node labeling (numbering) scheme becomes an increasingly important research issue. In order to process XML queries efficiently, it is necessary to detect the ancestor-descendant relationship between the nodes and restore the sequence order of nodes in the document. To solve this problem, the technique of labeling the document nodes is used. As a result, the so-called numbering scheme is created. The nodes of the documents are labeled with certain unique identifiers. Comparing these identifiers, one can restore the sequence order of the nodes and to establish the hierarchical relationships. In this paper, we give a survey of the most efficient numbering schemes and introduce a numbering scheme proposed by the authors and employed in the Sedna DBMS [1].  相似文献   

《Information Systems》2005,30(6):467-487
Due to its flexibility, XML is becoming the de facto standard for exchanging and querying documents over the Web. Many XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured XML data. Without a structural summary and efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, several path indexes have been proposed in the research community. Traditional indexes generally record all label paths from the root element in XML data and are constructed with the use of data only. Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries which start with the self-or-descendent axis(“//”). To improve the query performance, we propose an adaptive path index for XML data (termed APEX). APEX does not keep all paths starting from the root and utilizes frequently used paths on query workloads. APEX also has a nice property that it can be updated incrementally according to the changes of query workloads. Experimental results with synthetic and real-life data sets clearly confirm that APEX improves the query processing cost typically 2–69 times compared with the traditional indexes, with the performance gap increasing with the irregularity of XML data.  相似文献   

一种XML文档索引及查询处理方式   总被引:3,自引:0,他引:3  
本文首先论述了传统XML路径模式索引方式,在此基础上提出面向元素的XML文档索引方式和相关算法,以及使用扩展的后序遍历序号进行元素节点标识的方案,并给出了该索引方式和元素节点标识方案下规则路径表达式查询和树型模式查询处理的方法,最后说明该方式在效率上优于传统索引方式下规则路径表达式查询和树型模式查询处理。  相似文献   

基于区域划分的XML结构连接   总被引:22,自引:7,他引:22       下载免费PDF全文
王静  孟小峰  王珊 《软件学报》2004,15(5):720-729
结构连接是XML查询处理的核心操作,受到了研究界的关注.高效的算法是高效查询处理的关键.目前已经提出了许多结构连接的算法,它们中的大多数都基于如下的前提条件之一:输入元素集合存在索引或者有序.当这些条件不成立时,由于对输入数据临时排序或建索引的代价,这些算法的性能会大大下降.基于这样的观察,提出了一种基于区域划分的结构连接算法.该算法基于任务分解的思想,利用区域编码的特点对输入集合进行划分.给出了详细的算法设计,并对算法的I/O复杂性进行了分析.大量的实验结果显示,该算法具有良好的 性能,在输入数据无序或没有索引的情况下优于现有的排序合并算法,可以为查询计划提供更多的选择.  相似文献   

XML query processing based on labeling schemes has been thoroughly studied in the past several years. Recently efficient processing of updates in dynamic XML data has gained more attention. However, all the existing techniques have high update cost, they cannot completely avoid re-labeling in XML updates, and they will increase the label size which will influence the query performance. Thus, in this paper we propose a novel Compact Dynamic Binary String (CDBS) encoding to efficiently process updates. CDBS has two important properties which form the foundations of this paper: (1) CDBS supports that CDBS codes can be inserted between any two consecutive CDBS codes with orders kept and without re-encoding the existing codes; (2) CDBS is orthogonal to specific labeling schemes; thus it can be applied broadly to different labeling schemes or other applications to efficiently process updates. Moreover, because CDBS will encounter the overflow problem, we improve CDBS to Compact Dynamic Quaternary String (CDQS) encoding which can completely avoid re-labeling in XML leaf node updates no matter what the labeling schemes are. Meanwhile, we also discuss how to efficiently process internal node updates. We report the experimental results to show that our CDBS and CDQS are superior to previous approaches to process both leaf node and internal node updates.  相似文献   

原生XML存储方案直接关系到查询处理和数据更新。目前的原生XML存储方案大多关注于查询处理而很少涉及对数据更新的支持。与关系表的更新不同,XML更新要考虑到节点的文档顺序。提出了一种新的原生XML存储更新机制,它既保持了节点的文档顺序,又使更新操作局限于一个页面之内,保证了更新的效率。通过引入前向链接记录和重定位记录,该更新机制使页面分裂时记录存储地址保持不变,避免了索引更新的I/O开销。通过实例说明该原生XML存储方案的数据更新机制是有效的。  相似文献   

依赖于特定编码方案的高效查询处理算法是有效获取信息的必要手段,扩展Dewey编码以其祖先名称可知性的特点,在处理结构化查询时可显著减少需要扫描的元素数量,加快查询处理的速度。针对扩展Dewey编码不支持更新和依赖于DTD的缺陷,提出一种支持插入操作的动态扩展Dewey编码(DED),可避免执行插入操作时对已有结点的重新编码操作;提出一种支持DTD更新操作的动态有限状态转换器(DFST),可避免由于导出DTD的变化所导致的编码失效问题。最后通过实验验证了该编码的有效性。  相似文献   

首先给出了XML文档树、元素外延和名字路径等的形式化定义.接着,将编码方案、路径索引和名字外延的思想相结合,提出了一种改进的XML数据的索引结构(类型索引集、名字索引集和外延索引),解决了基于传统索引技术的XML数据查询方法性能上的不足,它既可以有效地支持结构连接的计算以快速地判断任意结点之间的子孙后代关系,也可以有效地支持基于名字外延的路径连接算法以快速地判断任意结点之间的父子关系,然后还可以快速地支持对包含拥有关系的小枝查询;进而给出了基于该索引结构的外延连接算法,并着重对其处理含有父子关系和拥有关系等较复杂的XPath查询路径的不同处理过程进行了对比和分析,使得对于一条长度为n的XPath绝对路径查询,最多只需要n/z-1次外延连接,且能够根据双亲结构信息等利用外延索引尽可能跳过不需要参与连接的结点,实验结果表明,提出的新的索引结构可以有效地提高查询处理的性能.  相似文献   

在各种XML查询语言中普遍采用路径表达式来表示对象间的嵌套和引用关系,路径表达式的求解是查询处理中的一个关键问题.本文提出一种基于路径索引与编码模式的路径连接方法,利用路径索引能够以与路径长度成比例的时间求出对象的后代或祖先的目标集,利用编码模式则可以用常数时间确定对象之间的祖先一后代关系.实验结果表明,本文提出的方法具有较高的效率,当对大量对象进行连接以及当路径的长度、路径上结点的出度或入度较大时,本文提出的方法明显优干自顶向下或自底向上遍历的方法。  相似文献   

Efficiently Querying Large XML Data Repositories: A Survey   总被引:1,自引:0,他引:1  
Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.  相似文献   

为了有效地利用文档类型定义(DTD)中的路径信息、减少结构连接次数,使用二元前缀码对DTD的元素或属性编码,并将DTD编码引入到XML节点编码中.在此基础上,将路径表达式查询分解成若干查询片段,利用二元前缀码的"位"操作高效地计算每个查询片段的结果,最后使用结构连接将这些查询结果组合起来.实验结果表明该方法是正确的和高效的.  相似文献   

In this paper, we propose an efficient encoding and labeling scheme for XML, called EXEL, which is a variant of the region labeling scheme using ordinal and insert-friendly bit strings. We devise a binary encoding method to generate the ordinal bit strings, and an algorithm to make a new bit string inserted between bit strings without any influences on the order of preexisting bit strings. These binary encoding method and bit string insertion algorithm are the bases of the efficient query processing and the complete avoidance of re-labeling for updates. We present query processing and update processing methods based on EXEL. In addition, the Stack-Tree-Desc algorithm is used for an efficient structural join, and the String B-tree indexing is utilized to improve the join performance. Finally, the experimental results show that EXEL enables complete avoidance of re-labeling for updates while providing fairly reasonable query processing performance.  相似文献   

XML文档数据编码模式是XML文档查询处理的基础, 好的文档编码模式有利于提高文档的查询效率. 为了解决XML数据查询效率低、支持动态更新等问题. 本文在二叉树遍历的编码基础上, 引入二叉树的三叉链表存储结构对XML文档结点进行编码. 该编码利用自然数作为编码序号, 因此编码长度较短; 引入结点双亲指针, 方便结点之间结构关系的判定, 结点采用三叉树链式存储, 方便文档的更新操作.  相似文献   

XML data can be represented by a tree or graph structure and XML query processing requires the information of structural relationships among nodes. The basic structural relationships are parent-child and ancestor-descendant, and finding all occurrences of these basic structural relationships in an XML data is clearly a core operation in XML query processing. Several node labeling schemes have been suggested to support the determination of ancestor-descendant or parent-child structural relationships simply by comparing the labels of nodes. However, the previous node labeling schemes have some disadvantages, such as a large number of nodes that need to be relabeled in the case of an insertion of XML data, huge space requirements for node labels, and inefficient processing of structural joins. In this paper, we propose the nested tree structure that eliminates the disadvantages and takes advantage of the previous node labeling schemes. The nested tree structure makes it possible to use the dynamic interval-based labeling scheme, which supports XML data updates with almost no node relabeling as well as efficient structural join processing. Experimental results show that our approach is efficient in handling updates with the interval-based labeling scheme and also significantly improves the performance of the structural join processing compared with recent methods.  相似文献   

通过对有序XML文档进行编码,在不需要访问XML原始数据文件的条件下实现对XML数据的高效处理。但是目前提出的支持插入更新的编码方案存在牺牲查询性能或者编码空间偏大等问题。提出了一种基于素数的新的编码方案FOP(Float-Order based-on Prime),FOP在没有降低查询性能的前提下,实现了XML文档的插入更新计算,并且编码空间得到了控制。实验结果表明FOP优于同类型的编码方案。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号