首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
首先给出了XML文档树、元素外延和名字路径等的形式化定义.接着,将编码方案、路径索引和名字外延的思想相结合,提出了一种改进的XML数据的索引结构(类型索引集、名字索引集和外延索引),解决了基于传统索引技术的XML数据查询方法性能上的不足,它既可以有效地支持结构连接的计算以快速地判断任意结点之间的子孙后代关系,也可以有效地支持基于名字外延的路径连接算法以快速地判断任意结点之间的父子关系,然后还可以快速地支持对包含拥有关系的小枝查询;进而给出了基于该索引结构的外延连接算法,并着重对其处理含有父子关系和拥有关系等较复杂的XPath查询路径的不同处理过程进行了对比和分析,使得对于一条长度为n的XPath绝对路径查询,最多只需要n/z-1次外延连接,且能够根据双亲结构信息等利用外延索引尽可能跳过不需要参与连接的结点,实验结果表明,提出的新的索引结构可以有效地提高查询处理的性能.  相似文献   

2.
针对XML的相对路径查询及引用路径查询问题,提出了一种面向XML数据的路径分块索引KI。探讨了KI索引构造方法、索引节点分裂算法和相关查询处理的算法,并用VC++实现,利用Shakespeare和Xorder数据集进行了XML查询测试,实验结果表明,提出的KI索引能有效地提高XML查询效率。  相似文献   

3.
基于DOM的XML数据库的索引技术研究   总被引:11,自引:1,他引:11  
XML作为一种数据交换的国际标准,已经贯穿于Internet应用的各个领域之中,如何快速准确地存储和查询XML数据的数据库技术是一个重要的研究课题。XML索引技术对XML数据库查询处理起着至关重要的作用,提出了基于DOM的XML数据库的索引技术(路径连接索引、值索引和引用索引),解决了传统的基于树的遍历的XML数据查询方法性能上的不足,并着重对处理含有谓词和引用关系等较复杂的查询路径的不同处理方法进行了对比和分析,还给出了索引空间利用率、查询性能和索引维护代价3个方面的标准测试结果,表明新的索引技术可以有效地提高查询处理效率。  相似文献   

4.
结构索引和倒排表在处理XML文档查询时,有不足之处。该文提出了一种结合结构索引、倒排表的策略、连接路径表达式的查询算法,有效地降低了实际执行的代价,提高了查询速度。  相似文献   

5.
后缀树的重要性可以为多年来学术界对它总是有新的发现而印证.它的结构简单,但可以在线性的时间里解决许多复杂的问题,被大量的使用在字符串及树的模式匹配中,对于XML标准,有很多基于关系库和对象库的索引技术和查询方案被提出来,我们试图给出一种基于后缀树进行路径导航的查询机制:用后缀树构造XML路径字典加速路径查询评价速度,我们提出可以在线地建立一个trie树的后缀树,讨论了XML路径字典中的后缀树建树算法,阐述了整个索引方案和查询机制,并探讨了包括RPE在内的它所支持的各种查询操作,XML路径字典被用于加快路径查询的评价速度.  相似文献   

6.
XML数据分页索引技术研究   总被引:2,自引:0,他引:2       下载免费PDF全文
对海量XML文档的索引查询技术进行研究,提出一种XML数据分页索引查询实现方法。该方法把页面元素标记数量作为数据分页依据,建立XML数据的分页索引,并在该分页索引上实现XPath查询。实验结果表明,该方法能够针对不同的索引页面,采用不同的索引查询方法,有效地提高了查询效率。  相似文献   

7.
基于索引的XML查询技术研究   总被引:2,自引:0,他引:2  
介绍了目前XML数据查询技术的研究现状,对主要的XML索引查询技术作了较深入的探讨,其中包括:基于路径索引的XML查询方法,如DataGuide、1-index、A(k)索引等;基于编码的XML索引查询方法,如Anc_Desc_B^+、XR树+XR-Stack算法等。文中对相关XML索引查询方法的优点和不足进行了分析。  相似文献   

8.
使用RDBMS的XML文档的扩展倒排索引技术   总被引:1,自引:0,他引:1  
胡光 《计算机工程》2005,31(3):99-101
倒排索引是目前检索领域广泛应用的一种技术,但要对XML文档实现包含查询,该技术还需要改进。该文提出了一种扩展倒排索引技术以处理包含查询,通过实验与以前的方法比较证明了它的有效性。该方法可以不对RDBMS做任何改动,应用在RDBMS中实现处理包含查询能够得到与IR实现一致的效果。  相似文献   

9.
丁峥  周虹 《计算机科学》2011,38(4):233-235
主要讨论如何突破版本恢复的限制直接对任意版本的XMI、文件进行复杂的结构查询,围绕这个主题,首先介绍了目前XML文档版本管理的一般办法,然后在多版本XML文档的编码方式的基础上提出并实现了一种新的索引机制,进而将结构化连接的查询方法引入XML版本管理的领域,改进了3个经典的结构连接算法,这些算法均能在不恢复版本的前提下直接进行任意版本的结构查询。实验分析比较了它们的查询效能并证明了基于索引的算法能最大程度地避免查询中的冗余。  相似文献   

10.
XQuery语言的高性能实现需要利用XML查询代数提供的查询优化方法,也需要采取高效的树模式整体匹配算法。为了将这两种XML查询处理技术有效地结合在XQuery语言处理系统中,提出了一种通用系统框架来支持XQuery语言的高性能实现。在这个框架内,提供开放式XML数据源连接,并且通过作为中间语言的一种函数式查询计划描述语言FXQL来支持各种查询代数算子和树查询模式的表示,既允许采用各种XML查询代数,又允许采用各种树模式查询算法;进而,通过这种中间层的程序变换可以实现基于各种查询代数的查询重写,并从查询计划中分离出独立的树模式查询计算,使两种查询处理技术适当地统一在同一系统框架中,有效地支持了多种环境下XQuery语言的实现。  相似文献   

11.
一种高效的XML路径查询索引   总被引:1,自引:0,他引:1       下载免费PDF全文
XML文档的查询索引是当前研究的热点。提出一种高效的XML路径查询索引KDXI,首先对XML文档进行编码,然后建立结构索引并对结构索引进行编码。研究了基于KDXI索引结构的半结构连接算法和路径查询处理过程。通过KDXI索引机制,可以有效执行一般的路径查询语句,并避免冗余的结构连接操作。实验证明了KDXI索引机制的优越性。  相似文献   

12.
Providing efficient query to XML data for ebXML applications in e-commerce is crucial, as XML has become the most important technique to exchange data over the Internet. ebXML is a set of specifications for companies to exchange their data in e-commerce. Following the ebXML specifications, companies have a standard method to exchange business messages, communicate data, and business rules in e-commerce. Due to its tree-structure paradigm, XML is superior for its capability of storing and querying complex data for ebXML applications. Therefore, discovering frequent XML query patterns has become an interesting topic for XML data management in ebXML applications. In this paper, we present an efficient mining algorithm, namely ebXMiner, to discover the frequent XML query patterns for ebXML applications. Unlike the existing algorithms, we propose a new idea by collecting the equivalent XML queries and then enumerating the candidates from infrequent XML queries in our ebXMiner. Furthermore, our simulation results show that ebXMiner outperforms other algorithms in its execution time.  相似文献   

13.
A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach.  相似文献   

14.
XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries.  相似文献   

15.
Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.  相似文献   

16.
XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex θ-joins, which render well-known estimation techniques for relational equijoins, such as discrete cosine transform, wavelet transform, and sketch, not applicable. In this paper, we model structural joins from a relational point of view and convert the complex θ-joins to equijoins so that those well-known estimation techniques become applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that discrete cosine transform requires the least memory and yields the best estimates among the three techniques. Compared with state-of-the-art method IM-DA-Est, discrete cosine transform is much faster, requires less memory, and yields comparable estimates.  相似文献   

17.
路径表达式查询是XML数据查询处理的核心研究问题之一,研究者开展了大量的研究工作.但这些研究更多关注XML数据上路径表达式的匹配,忽略了谓词"包含".研究XML查询处理中谓词"包含"的查询处理方法.采用了两种方法,第一种是采用跳跃表的方法,在XML分枝模式匹配时动态地对结点数据进行读取和关键字匹配.第二种是为XML文档中的词语建立倒排索引,来实现关键字的匹配.并从分枝模式路径长度、查询关键的数量和"包含"谓词判断结点的类型,对两种方法进行了分析和比较.  相似文献   

18.
An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query and subsequently merge intermediate results, while the others, called holistic twig joins, evaluate a query as a whole. Moreover, these algorithms also differ in what index data structure they use to handle XML data. There exist cost-based approaches utilizing binary joins and various index data structures; however, they share a limitation. The limitation is that they cannot perform a join between query nodes not having a direct XPath relationship. Such a join can be advantageous especially if their joint selectivity is high. Since holistic joins work with all query nodes they overcome this limitation. In this article, we introduce such a holistic twig join called CostTwigJoin. To the best of our knowledge, CostTwigJoin is the first holistic join capable of combining various index data structures during an evaluation of an XML query. Usage of the holistic join has yet another advantage for cost-based approaches: an optimizer does not have to resolve the order of binary joins; therefore, the search space is reduced. In this article, we perform thorough experiments on hundreds of queries to evaluate our approach and demonstrate its advantages.  相似文献   

19.
随着XML作为Internet上数据表示和交换的标准,如何高效地进行XML数据的查询己经变得越来越重要,许多XML查询语言也随之出现。这些查询语言虽然种类繁多,但都有个共同特征:使用基于XPath数据模型下规则路径表示来查询XML数据。研究表明,当前的关系数据库技术在处理规则路径表示的查询时通常效率不高。文章在介绍了传统的基于遍历树的方法的基础上重点讨论了基于路径分解的查询处理算法,并对选择连接顺序算法提出了基于动态规划思想的改进。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号