首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Jian Liu  Z. M. Ma  Li Yan 《World Wide Web》2013,16(3):325-353
As the next generation language of the Internet, XML has been the de-facto standard of information exchange over the web. A core operation for XML query processing is to find all the occurrences of a twig pattern in an XML database. In addition, the study of probabilistic data has become an emerging topic for various applications on the Web. Therefore, researching the combination of XML twig pattern and probabilistic data is quite significant. In prior work of probabilistic XML, the answers of a given twig query are always complete. However, complete answers with low probabilities may be deemed irrelevant while incomplete answers with high probabilities are of great significance because incomplete answers may be the potential answers that interest the users. Different from complete evaluation, evaluating incomplete twigs in probabilistic XML introduces some new challenges. On one hand, incomplete queries do not only obtain complete matches, but also return answers that contain considerable incomplete matches. On the other hand, the processing of incomplete evaluation is more complicated. It is obvious that a ranking approach should be adopted along with evaluating incomplete answers. In this paper, we propose an efficient algorithm to handle the problem of querying incomplete twigs over the probabilistic XML database. We also present a novel algorithm for ranking the incomplete answers. The experimental results show that our proposed algorithms can improve the performance of querying and ranking incomplete twigs significantly.  相似文献   

2.
Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.  相似文献   

3.
Matching twigs in fuzzy XML   总被引:2,自引:0,他引:2  
A considerable amount of twig pattern matching algorithms have been proposed to holistically process a twig query. Those algorithms mainly focus on twig pattern query with the AND-logic. However, there is often a need to process a twig query with the OR-predicates. Furthermore, the existing algorithms fall short in their ability to support twig query with OR-logic in fuzzy XML. To overcome this limitation, in this paper, we first introduce a novel encoding scheme to represent node information in fuzzy XML. Based on the encoding scheme, we then propose an effective algorithm for matching a twig pattern query with the AND/OR-logic in fuzzy XML. Our approach adopts a compact stack technique to process the complicated twig query consisting of both AND-logic and OR-logic. More importantly, our method eliminates re-scanning unnecessary portions of XML documents and redundant intermediate results. Finally, the experimental results demonstrate the performance advantages of our approach.  相似文献   

4.
目前,不确定XML数据的top-k查询算法中都没有处理连续不确定数据,本文提出SPCProTJFast算法,该算法改进了传统的归并算法,并结合连续不确定数据的过滤方法,实现了连续不确定XML的Top-k查询。为了避免概率下限值过小对过滤效果的影响,又提出HPCProTJFast算法,该算法推迟了对连续节点的处理,只有在获得满足概率条件的整枝路径时才对连续节点进行访问。实验表明,在执行时间以及过滤效率上,同直接处理连续不确定数据的ProTJFast算法相比,这两种算法都要更高效,并且HPCProTJFast算法的效率更高。  相似文献   

5.
Secure broadcasting of web documents is becoming a crucial requirement for many web-based applications. Under the broadcast document dissemination strategy, a web document source periodically broadcasts (portions of) its documents to a potentially large community of users, without the need for explicit requests. By secure broadcasting, we mean that the delivery of information to users must obey the access control policies of the document source. Traditional access control mechanisms that have been adapted for XML documents, however, do not address the performance issues inherent in access control. In this paper, a labeling scheme is proposed to support rapid reconstruction of XML documents in the context of a well-known method, called XML pool encryption. The proposed labeling scheme supports the speedy inference of structure information in all portions of the document. The binary representation of the proposed labeling scheme is also investigated. In the experimental results, the proposed labeling scheme is efficient in searching for the location of decrypted information.  相似文献   

6.
路径分区编码优化小枝查询   总被引:1,自引:1,他引:0  
徐小双  冯玉才  王锋  周英飚  张俊 《计算机科学》2010,37(3):182-187204
有效地存储查询XML文档已经成为当今数据库领域的研究热点。从XML文档的路径统计出发,提出了路径分区存储编码方案,并依此消除了小枝查询的后裔边和通配符。针对这类不含//和*的小枝查询,利用路径分区编码的特性,给出了基于结构约束节点的Twig查询算法,极大地减少了结构连接次数。实验表明,该算法能有效滤除无关元素,提高小枝查询效率。  相似文献   

7.
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability.  相似文献   

8.
概率XMI、是描述不确定数据的有效方式,Dcwcy编码是一种重要的XMI、文档关键字索引编码技术。在概率XML大文档关键字索引检索过程中,频繁地比较关键字索引Dewey编码非常耗时。针对上述问题,对概率XML文档进行分区,并设计了适合概率XML文档特点的关键字索引的Dewey编码策略,提出了一种概率XML文档Top-k关键字并行检索算法PTKS(Parallcl Top-k Keyword Scarch Algorithm)。实验证明,P"I'KS提高了概率XM工文档关键字检索的时间效率,尤其在文档结构复杂度高的情况下检索效率提高更加显著。  相似文献   

9.
Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It explores the idea of extracting paths from XML documents, encoding them as sequences and finding the maximal frequent sequences using the sequential pattern mining algorithms. In view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining, a new sequential pattern mining scheme for XML document similarity computation is proposed in this paper. It makes use of a preorder tree representation (PTR) to encode the XML trees paths so that both the semantics of the elements and the hierarchical structure of the document can be taken into account when computing the structural similarity among documents. In addition, it proposes a postprocessing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between XML documents can be introduced. Encouraging experimental results were obtained and reported.  相似文献   

10.
概率XML文件是概率数据的网络数据交换和表示标准,元素取值及其概率的查询与计算是概率XML文件的重要研究内容.概率XML文件树是一种有效的概率XML文件的数据模型,定义了概率XML文件树的基本路径和扩展路径,提出了根据可能世界原理将概率XML文件树分解为普通子XML树的集合的算法,根据路径分析原理将概率XML文件树分解为子概率XML树的集合的算法和相应的查询与计算结点及结点集合概率的算法,并通过实验进行了比较分析.实验结果表明:这两种方法是有效的;与前一种方法比较,后一种方法适合较大的概率XML文件树、结点及结点集合的概率的查询,计算过程较简单.  相似文献   

11.
Indexing and querying XML using extended Dewey labeling scheme   总被引:1,自引:0,他引:1  
Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.  相似文献   

12.
13.
将约束引入到可能性XML时,因分布节点值的不确定性,XML文档无法检验对涉及分布节点的约束的合法性。在一些场景中,要返回给用户符合约束的查询结果和概率,通常没有考虑去除不可能域分布来做条件计算,得出的并不是准确概率。为此,提出一种可能性XML的概念,给出可能性XML中可能域和约束的有效表达,通过提出的方案解决条件计算和准确概率计算的问题。实验结果表明,该算法的效率较高,条件计算后的结果更能被用户接受。  相似文献   

14.
一种基于结构索引的XML模式匹配方法   总被引:2,自引:0,他引:2  
XML文档采用了树型的数据模型,对其查询通常是用带有选择谓词的模式树在XML数据中进行匹配.因此,找出XML文档中所有符合模式树结构的元素集,是XML查询处理的核心操作.本文提出了结构索引JoinGuide,并在此基础上提出了一种新的XML模式匹配方法.它使用JoinGuide来对模式树进行预匹配,这样在XML文档上查询时可以利用索引上的匹配结果来忽略部分连接谓词和不必要的候选XML元素序列.本文还提出了三种具体算法来利用索引匹配结果进行进一步的查询.实验结果表明本文中的模式树匹配方法优于以往的匹配方法,并且索引所需的空间很小.  相似文献   

15.
XML数据库的查询优化技术是当前数据库领域中的一个研究热点,而小枝模式匹配又是其中的一个研究重点.在总结分析各种小枝模式匹配算法的基础上,提出了一种新的基于Extended Dewey编码的小枝模式匹配方法.该方法首先使用TJFast算法在XML文档的JoinGuide索引上进行预匹配,然后再扫描预匹配结果中的叶子结点序列就可以找出所有的匹配结果.最后,用实验的方法同其它算法作了比较,并对实验结果进行了分析.  相似文献   

16.
通过比较基于可能世界模型的概率数据在关系数据模型和XML数据模型中的表示方法,根据概率属性与普通属性的关系把概率关系模式分为1NF和3NF,根据分布节点与普通节点的关系把概率XML模式也分为1NF和3NF,以扩展的概率DTD文件为例设计了概率关系模式和概率XML模式之间的转换算法。实例分析结果表明该算法是有效的,也为现存的概率关系数据与概率XML数据之间提供了一种有效的模式转换方法。  相似文献   

17.
当前针对小枝模式的XML查询是XML文档查询的研究热点。文章在分析XML数据小枝查询处理常用算法的基础上,提出了一种高灵活性的、易确定结点对之间结构关系的EDiezt-P编码,并基于EDiezt-P编码和层次栈结构提出了一种自底向上的小枝查询算法。实验表明,该算法在一定程度上减少了查询处理时间,提高了查询效率。  相似文献   

18.
The innate verbosity of the extensible markup language (XML) remains one of its main weaknesses, especially when large documents are concerned. This problem can be solved with the aid of dedicated XML compression algorithms. In this work, we describe XML word‐replacing transform (XML‐WRT), a fast and fully reversible XML transform, which, when combined with generally used LZ77‐style compression algorithms, allows to attain high compression ratios, comparable to those achieved by the current state‐of‐the‐art XML compressors. The resulting compression scheme is asymmetric in the sense that its decoder is much faster than the coder. This is a desirable practical property, as in many XML applications data are read much more often than written. The key features of the transform are dictionary‐based encoding of both document structure and content, separation of different content types into multiple streams, and dedicated encoding of specific patterns, including numbers and dates. The test results show that the proposed transform improves the XML compression efficiency of general‐purpose compressors on average by 35% in case of gzip, and 17% in case of LZMA. Compared with the current state‐of‐the‐art SCMPPM algorithm, XML‐WRT with LZMA attains over 2% better compression ratio, while being 55% faster. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

19.
缪丰羽  王宏志 《计算机科学》2016,43(11):284-290
模糊XML文档是指包含不确定信息的XML文档。在模糊XML文档查询方面,现有的研究成果较少,并且都是基于树型结构的XML文档进行的。针对图结构下模糊XML文档的特征,设计了一组高效的图结构模糊XML文档上的模式匹配算法。该算法基于一种适合于图结构文档的索引方式,采用自底向上的结点匹配顺序,大大减少了结点的重复判断操作,也不需要进行局部匹配结果的归并以及针对PC关系设计额外的过滤函数。理论分析以及实验结果证明,提出的模式匹配算法不仅在小枝查询性能上优于现有的相关算法,而且能够较好地实现DAG模式匹配查询。  相似文献   

20.
XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号