期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient processing of top-k twig queries over probabilistic XML data

Bo Ning Chengfei Liu Jeffrey Xu Yu 《World Wide Web》2013,16(3):299-323

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms. 相似文献

2.

Efficiently Querying Large XML Data Repositories: A Survey 总被引：1，自引：0，他引：1

Gang Gou Chirkova R. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(10):1381-1403

Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs. 相似文献

3.

改进的XML小枝模式匹配方法

下载免费PDF全文

魏东平朱新向吴玉雁《计算机工程与应用》2013,49(8):125-128

近年来, XML数据查询成为一个重要的研究课题。处理小枝查询是XML查询实现的核心操作,针对小枝模式查询,提出了一种改进的小枝模式匹配算法。该算法通过剪去无用的数据流以减少待处理结点的数目,从而节省处理时间,提高查询的准确率。实验结果表明,该算法能够有效提高查询效率。相似文献

4.

Efficient management of uncertainty in XML schema matching

Jian Gong Reynold Cheng David W. Cheung 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(3):385-409

Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. 相似文献

5.

一种XML数据流小枝模式查询算法

下载免费PDF全文

白山陶世群《计算机工程》2012,38(6):69-70

针对传统XML文档小枝模式查询算法系统开销大的问题,提出一种XML数据流小枝模式查询算法。该算法结合SAX数据流解析技术,将层次关系队列结构应用于XML文档查询中,采用动态生成区间编码的方式,免除建立编码索引文件的步骤。实验结果表明,在对相关数据集进行查询时,该算法可减少I/O操作,缩短查询响应时间,提高查询效率。相似文献

6.

路径分区编码优化小枝查询 总被引：1，自引：1，他引：0

徐小双冯玉才王锋周英飚张俊《计算机科学》2010,37(3):182-187204

有效地存储查询XML文档已经成为当今数据库领域的研究热点。从XML文档的路径统计出发,提出了路径分区存储编码方案,并依此消除了小枝查询的后裔边和通配符。针对这类不含//和*的小枝查询,利用路径分区编码的特性,给出了基于结构约束节点的Twig查询算法,极大地减少了结构连接次数。实验表明,该算法能有效滤除无关元素,提高小枝查询效率。相似文献

7.

Visual exploration and retrieval of XML document collections with the generic system X2

Holger Meuss Klaus U. Schulz Felix Weigel Simone Leonardi François Bry 《International Journal on Digital Libraries》2005,5(1):3-17

This article reports on the XML retrieval system x² that has been developed at the University of Munich over the last 5 years. In a typical session with x², the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. x² largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of x² that distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering, grouping and ranking retrieved elements once the complete answer set has been computed. 相似文献

8.

A syntactic approach to twig-query matching on XML streams

Chien-Ping Chou Author VitaeKuen-Fang JeaAuthor Vitae Heng-Hsun Liao Author Vitae 《Journal of Systems and Software》2011,84(6):993-1007

Query matching on XML streams is challenging work for querying efficiency when the amount of queried stream data is huge and the data can be streamed in continuously. In this paper, the method Syntactic Twig-Query Matching (STQM) is proposed to process queries on an XML stream and return the query results continuously and immediately. STQM matches twig queries on the XML stream in a syntactic manner by using a lexical analyzer and a parser, both of which are built from our lexical-rules and grammar-rules generators according to the user's queries and document schema, respectively. For query matching, the lexical analyzer scans the incoming XML stream and the parser recognizes XML structures for retrieving every twig-query result from the XML stream. Moreover, STQM obtains query results without a post-phase for excluding false positives, which are common in many streaming query methods. Through the experimental results, we found that STQM matches the twig query efficiently and also has good scalability both in the queried data size and the branch degree of the twig query. The proposed method takes less execution time than that of a sequence-based approach, which is widely accepted as a proper solution to the XML stream query. 相似文献

9.

连续概率XML数据查询处理技术

下载免费PDF全文

张晓琳郑珍珍刘立新李玉峰《计算机工程与科学》2012,34(12):134-139

目前查询连续概率XML数据多采用离散化方法,需要处理大量直方图分段,查询效率较低。本文提出了一种基于p-文档模型的连续概率XML数据查询处理技术,首先利用cont节点扩展p-文档模型支持任意的连续分布,在cont节点中编码概率密度函数以及他们的参数;其次采用twig模式匹配找到符合用户要求的路径;然后根据要查询的连续分布类型确定概率查询应该使用符号表示法、积分法或直方图近似法:标准连续分布通过符号表示法中的参数或复杂的累积分布函数计算查询结果,满足积分条件的非标准连续分布采用积分法,其它情况采用直方图近似法。实验结果表明,该方法在概率查询的精确度以及响应时间上比现有方法更高效。相似文献

10.

基于Extended Dewey编码的小枝模式匹配方法

傅立功陶世群《计算机工程与设计》2008,29(4):803-806

XML数据库的查询优化技术是当前数据库领域中的一个研究热点,而小枝模式匹配又是其中的一个研究重点.在总结分析各种小枝模式匹配算法的基础上,提出了一种新的基于Extended Dewey编码的小枝模式匹配方法.该方法首先使用TJFast算法在XML文档的JoinGuide索引上进行预匹配,然后再扫描预匹配结果中的叶子结点序列就可以找出所有的匹配结果.最后,用实验的方法同其它算法作了比较,并对实验结果进行了分析. 相似文献

11.

不确定数据库中概率top-k和排序查询算法

周帆李树全肖春静吴跃《计算机应用》2010,30(10):2605-2609

传感器网络等技术的广泛应用产生了大量不确定数据。近年来,对于不确定数据的处理和查询成为数据库和数据挖掘领域研究的热点。其中,传统关系数据库中的top-k查询和排序查询怎样拓展到不确定数据是其中的焦点之一。研究近年来提出的不确定数据库上top-k查询和排序查询算法,归纳和比较目前各种不同查询算法所适应的语义世界和应用场景,并详细分析各种算法的执行效率和算法复杂度。另外,对于不确定数据top-k查询和排序查询所面临的挑战和可能的研究方向进行了总结。相似文献

12.

Evaluation Techniques for Generalized Path Pattern Queries on XML Data

Xiaoying Wu Dimitri Theodoratos Stefanos Souldatos Theodore Dalamagas Timos Sellis 《World Wide Web》2010,13(4):441-474

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model. 相似文献

13.

改进的基于小枝模式的匹配算法——cTwigStack

姚全珠郭祯房美君《计算机应用》2011,31(10):2782-2785

给定一个小枝模式查询,如何快速地在XML数据集中找到所有感兴趣的信息,已成为当前研究的热点。针对TwigStack算法在处理含有父子节点的情况下会产生大量的中间结果等问题,通过栈来对非叶子节点缓存和对叶子节点延迟输出的思想,提出了一种改进的小枝模式匹配算法--cTwigStack。采用Treebank数据集进行测验,结果表明该算法不仅仅在处理祖孙/后继节点时能使输出结果的准确性达到最优,而且在处理父子节点时,相对目前提出的算法,也是非常高效的。相似文献

14.

TwigEN：一种新的小枝模式匹配算法

苗红涛陶世群《电脑开发与应用》2009,22(2):38-40

随着互联网的迅速发展,XML已经成为网上通用的数据表示与交换的标准。因此,如何有效地查询XML数据成为一个重要的研究课题。近年来,小枝模式匹配问题已被广泛地研究,提出了不少小枝模式匹配算法。在汲取各种小枝模式匹配算法优点的基础上,提出了一种新的小枝模式匹配算法TwigEN。根据XML文档结构它可以跳过那些在结构连接中无用的元素结点,这样不仅减少了待处理结点的数目,缩短了处理时间,而且也节省了内存空间。相似文献

15.

支持XML查询代数和树模式查询的XQuery系统框架

下载免费PDF全文

张晓博廖湖声《计算机科学与探索》2010,4(11):996-1004

XQuery语言的高性能实现需要利用XML查询代数提供的查询优化方法,也需要采取高效的树模式整体匹配算法。为了将这两种XML查询处理技术有效地结合在XQuery语言处理系统中,提出了一种通用系统框架来支持XQuery语言的高性能实现。在这个框架内,提供开放式XML数据源连接,并且通过作为中间语言的一种函数式查询计划描述语言FXQL来支持各种查询代数算子和树查询模式的表示,既允许采用各种XML查询代数,又允许采用各种树模式查询算法;进而,通过这种中间层的程序变换可以实现基于各种查询代数的查询重写,并从查询计划中分离出独立的树模式查询计算,使两种查询处理技术适当地统一在同一系统框架中,有效地支持了多种环境下XQuery语言的实现。相似文献

16.

Index structures for matching XML twigs using relational query processors

《Data & Knowledge Engineering》2007,60(2):283-302

Various index structures have been proposed to speed up the evaluation of XML path expressions. However, existing XML path indices suffer from at least one of three limitations: they focus only on indexing the structure (relying on a separate index for node content), they are useful only for simple path expressions such as root-to-leaf paths, or they cannot be tightly integrated with a relational query processor. Moreover, there is no unified framework to compare these index structures. In this paper, we present a framework defining a family of index structures that includes most existing XML path indices. We also propose two novel index structures in this family, with different space–time tradeoffs, that are effective for the evaluation of XML branching path expressions (i.e., twigs) with value conditions. We also show how this family of index structures can be implemented using the access methods of the underlying relational database system. Finally, we present an experimental evaluation that shows the performance tradeoff between index space and matching time. The experimental results show that our novel indices achieve orders of magnitude improvement in performance for evaluating twig queries, albeit at a higher space cost, over the use of previously proposed XML path indices that can be tightly integrated with a relational query processor. 相似文献

17.

Matching twigs in fuzzy XML 总被引：2，自引：0，他引：2

Z.M. Ma Jian Liu 《Information Sciences》2011,181(1):184-200

A considerable amount of twig pattern matching algorithms have been proposed to holistically process a twig query. Those algorithms mainly focus on twig pattern query with the AND-logic. However, there is often a need to process a twig query with the OR-predicates. Furthermore, the existing algorithms fall short in their ability to support twig query with OR-logic in fuzzy XML. To overcome this limitation, in this paper, we first introduce a novel encoding scheme to represent node information in fuzzy XML. Based on the encoding scheme, we then propose an effective algorithm for matching a twig pattern query with the AND/OR-logic in fuzzy XML. Our approach adopts a compact stack technique to process the complicated twig query consisting of both AND-logic and OR-logic. More importantly, our method eliminates re-scanning unnecessary portions of XML documents and redundant intermediate results. Finally, the experimental results demonstrate the performance advantages of our approach. 相似文献

18.

Probabilistic query answering over inconsistent databases

Sergio Greco Cristian Molinaro 《Annals of Mathematics and Artificial Intelligence》2012,64(2-3):185-207

This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP ^#P-complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries. 相似文献

19.

有效支持XML结构化连接的索引——CATI 总被引：1，自引：0，他引：1

于亚新王国仁张海宁李建新《计算机研究与发展》2007,44(1):111-118

结构化连接的效率直接影响着XML查询的性能,目前对XML的结构化连接大多都是基于编码的方法.介绍了一种全新的有效支持XML结构化连接的树索引CATI(compact ancestor tree index)CATI的基本思想是,对于给定的一个祖先后代查询(A-D查询)或Twig查询,遍历XML文档,找出所有的祖先A的实例,用以建立CATI的主干;对于每个A实例,找出它的直接后代D的实例链接在它的后面.因为经典的结构连接算法Stack-Tree算法效率较高且使用较广,因此应用基于CATI的结构连接算法和基于Stack-Tree的结构连接算法就A-D查询和Twig查询做了大量实验.实验结果表明,基于CATI的结构化连接在一般查询情况下性能明显优于基于Stack-Tree的结构化连接. 相似文献

20.

基于约束的可能性XML条件计算

下载免费PDF全文

朱永俊谈子敬《计算机工程》2011,37(16):72-74

将约束引入到可能性XML时,因分布节点值的不确定性,XML文档无法检验对涉及分布节点的约束的合法性。在一些场景中,要返回给用户符合约束的查询结果和概率,通常没有考虑去除不可能域分布来做条件计算,得出的并不是准确概率。为此,提出一种可能性XML的概念,给出可能性XML中可能域和约束的有效表达,通过提出的方案解决条件计算和准确概率计算的问题。实验结果表明,该算法的效率较高,条件计算后的结果更能被用户接受。相似文献