首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 171 毫秒
图结构的XML数据被越来越多地应用在实践中,相关的问题也随之而生 .传统的XML树模型处理算法和图结构模型处理技术并不能够有效地处理图结构XML数据上的DAG查询 .提出了一种基于可达性区间编码的GraphJoin算法,利用区间桶这种数据结构,直接对图结构XML数据上的DAG查询进行全局匹配 .理论分析和实验结果表明,这种算法在查询时间和执行效率上好于现有的技术.  相似文献   

不确定图数据库中高效查询处理   总被引:9,自引:3,他引:6  
近年来,在多种领域中产生的大量数据都可以自然地建模为图结构,比如蛋白质交互网络、社会网络等.测量手段的不准确性以及数据本身的性质导致不确定性在很多图数据中普遍存在.文中研究不确定图数据库中的高效查询处理方法.首先给出一种数据模型来表示图的不确定性.鉴于对用户提交的查询图通常会产生大量匹配结果,高效得到概率最大的k个匹配常常更具有现实意义.因此文中形式化提出概率top-k子图匹配查询的问题.为了解决提出的查询问题,以附带概率信息的邻居子图为基础,设计了一种有效的索引结构.另外,提出一种高效的基于索引的查询处理方法.该查询处理方法的核心是一个基于搜索树的匹配算法,其中运用了一种概率剪枝技术来提高性能.实验结果表明,所提出方法具有良好的效率和可扩展性.  相似文献   

图模型具有强大的表达能力,被广泛用于各种应用领域的数据建模.如何在大规模图数据库中进行高效子图包含查询是当前的研究难点之一.由于子图同构是一个NP完全问题,在现有的子图包含查询算法中,基于图特征的索引技术被广泛用来提高查询处理性能,但是这些索引结构的维护代价较高.针对有向无环图提出了一种基于拓扑序列的子图包含查询算法,...  相似文献   

图是一种很强大的工具,在许多应用领域如化学化合物,生物信息,XML文档,图像处理和社会网络等应用中它可以表示其对象及它们之间的关系,而且在模式化复杂的结构数据时图发挥了越来越重要的作用.图的一个最基本的操作是图的查询处理,经典的图查询问题是给出图数据库和一个查询图,从图数据库中找出那些包含查询图作为子图的图.在本文中对于给定的查询图提出了一种有效的索引策略,在图数据库中选取具有判别力的树作为特征树,对这些特征树进行编码,将结构之间的比较转化为编码序列之间的比较,并利用特征树建立索引,提出了两种剪枝策略,过滤掉数据库中与查询图不是精确匹配的图.实验验证了所提出查询处理算法的有用性和有效性.  相似文献   

稠密子图的查询是图分析领域的重要研究问题之一,在社交用户相关性分析、Web中社群分析等方面都有着广泛的应用.目前,关于稠密子图查询的研究工作主要基于静态图.而在实际应用中,时序信息会对稠密子图查询产生重要的影响,使得图拓扑结构随时间序列不断发生变化,包含的信息量也不断增加,使得已有的针对静态图的查找方法不再适用于时序图.因此,如何高效地在时序图上查找稠密子图仍然是一个挑战.为了解决上述挑战,首先规范化地定义了基于时序图的稠密子图查找问题;然后,根据图的拓扑结构和包含时间标签的边之间的相似度,提出一种基于阈值的近似查找算法DTS-base.为了加快算法的收敛速度,提出了一个基于快速计算最大相似度时间片的优化算法DTS-opt.最后,通过在真实数据集上的实验,证明了所提算法的高效性和可扩展性.  相似文献   

XML数据可以用图结构模型来表示.在实际应用中,这种图结构的数据会进行动态更新.目前,对于图结构XML数据上的更新操作并没有正式的定义,而且对于动态XML上查询处理的研究还很少.详细地讨论了图结构XML数据上的更新操作,并基于支持更新的可达性编码ICDBS,提出了IGraph算法对动态图结构XML数据上的查询进行处理.实验表明该算法具有很好的效率.  相似文献   

在不确定数据的处理中,不确定图作为典型的数据模型得到了广泛的关注,研究的内容包括基于不确定图的子图匹配、最近邻查询及连接查询等,本文研究基于距离阈值的不确定图可达性查询,即给定不确定图及图中任意两点s、t和距离阈值d,返回s和t的d可达的概率.提出一种基于随机抽样的可达性查询处理算法.定义了一种不确定图可能图实例的分类树模型.为了提高图实例分类的获取效率,提出基于双向遍历的优化分类树模型.设计了基于图实例类抽样的可达性查询处理算法并通过理论分析和实验验证了算法的性能.  相似文献   

面向不确定图的概率可达查询   总被引:1,自引:0,他引:1  
图的可达性查询被广泛应用于生物网络、社会网络、本体网络、RDF数据库和XML数据库等.由于对数据操作时引入的噪声和错误使这些图数据具有不确定性,已经有大量的针对不确定RDF和XML数据库的研究.文中使用可能世界语义模型构建不确定图,基于该模型,研究了概率可达查询(PR).处理PR查询是#P完全问题,对此文中首先给出一个基本随机算法,可快速地估算出可达概率,并且该值有很高的精确度.进一步,文中为随机算法引入条件分布(称为"条件随机算法"),采用图的不相交路径集和割集作为条件概率分布,因此改进的随机算法可准确地并且是在多项式时间内处理查询.最后基于真实不确定图数据的大量实验结果验证了文中的设计.  相似文献   

传统的子图查询算法大多只在图数据库上进行一次挖掘算法,即在图数据库上建立稳定的数据库索引后将不再对索引进行更新.随着查询兴趣的改变或数据库的频繁更新,原有的数据库索引将不再能提供有用的信息来减少查询过程中候选图的数量.为此,提出一种双索引的子图查询算法,同时在数据库和查询流上挖掘频繁子图并建立索引.子图查询和查询流索引的建立同步进行,即使查询兴趣改变,查询流索引也能自适应地更新索引信息来优化查询效率.针对数据库的频繁更新,查询流索引已提供实时的有效信息,数据库索引无需重新建立.实验结果表明,双索引的结合能有效提高查询子图的处理效率.  相似文献   

作为网络数据交换和数据共享的标准,XML数据越来越多地用于表示应用系统的流数据。然而,受制于流数据处理有限空间开销等特征,如何高效地实现这种查询成为值得探讨的问题。与传统的基于自动机或层次栈方法不同,文中提出了一种基于图归约的XML查询自动机(GRAT),采用一种图结构来表示针对不同XML流元素的子查询任务之间的关系,通过图的归约变化来实现XPath查询。实验结果表明,基于GRAT的查询算法能够高效地完成复杂的XML查询,流数据处理的吞吐量达到了较高水平。  相似文献   

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063.  相似文献   

A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach.  相似文献   

Recently research has deeply investigated the problem of querying semi-structured data and data which can be represented by means of graphs (e.g. object-oriented data, XML data, etc.). Typically queries on graph-like data, called path queries, are expressed by means of regular expressions denoting paths in the graph. The result of a path query is the set of nodes reachable by means of a path expressed by a specified regular expression. In this paper we investigate the problem of extracting a subgraph satisfying a given property from a given graph representing some information. We propose a new form of queries, called graph queries, whose answers are (marked) graphs having a particular structure, extracted from the source graph. We show that a simple form of graph grammars can be profitably used to define graph queries. The result of a graph query, using a grammar G over a database D, is a marked subgraph of D ‘matching’ a graph derived from G. We consider different types of graph grammars which can be used to query graph-like data and consider their expressiveness and complexity.  相似文献   

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.  相似文献   

The widespread use of graph-based models for representing data collections (e.g. object-oriented data, XML data, etc.) has stimulated the database research community to investigate the problem of defining declarative languages for querying graph-like databases. In this paper, a new framework for querying graph-like data based on graph grammars is proposed. The new paradigm allows us to verify structural properties of graphs and to extract sub-graphs. More specifically, a new form of query (namely graph query) is proposed, consisting in a particular graph grammar which defines a class of graphs to be matched on the graph representing the database. Thus, differently from path queries, the answer of a graph query is not just a set of nodes, but a subgraph, extracted from the input graph, which satisfies the structural properties defined by the graph grammar. Expressiveness and complexity of different forms of graph queries are discussed, and some practical applications are shown.  相似文献   

在使用"不完全结构的约束查询(PSTP查询)"从XML文档中获取信息时,用户可以根据自身对XML文档结构的熟悉程度,在查询表达式中灵活地嵌入结构约束条件,从而满足完全不了解、完全了解及了解部分结构信息的各种用户的查询需求。提出一种基于扩展Dewey编码的查询处理算法,可以在仅扫描一遍元素的情况下,处理任意形式的PSTP查询。不同数据集上的实验结果表明,EDPS算法在处理twig查询、不包含"*"结点的PSTP查询及包含"*"结点的PSTP查询时,综合性能明显优于已有方法。  相似文献   

The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.  相似文献   

Branch query processing is a core operation of XML query processing. In recent years, a number of stack based twig join algorithms have been proposed to process twig queries based on tag stream index. However, in tag stream index, each element is labeled separately without considering the similarity among elements. Besides, algorithms based on tag stream index perform inefficiently on large document. This paper proposes a novel index, named Clustered Chain Path Index, based on a novel labeling scheme. This index provides efficient support for processing branch queries. It also has the same cardinality as 1-index against tree structured XML document. Based on CCPI, efficient algorithms, KMP-Match-Path and Related-Path-Segment-Join, are proposed to process queries efficiently. Analysis and experimental results show that proposed query processing algorithms based on CCPI outperform other algorithms and have good scalability. This paper is partially supported by Natural Science Foundation of Heilongjiang Province, Grant No. zjg03-05 and National Natural Science Foundation of China, Grant No. 60473075 and Key Program of the National Natural Science Foundation of China, Grant No. 60533110.  相似文献   

XML数据中的不正确数据、不一致数据、不精确数据等劣质数据给XML数据上的有效查询处理带来了挑战。重点研究了标签劣质的XML数据上twig查询的处理方法。给出求得每个标签的拼写相近标签、松弛标签、同义标签等相似标签的方法,以及在XML文档中求出与原查询相似的所有查询结果的高效算法。通过实验证明了所提方法的有效性和效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号