首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers.  相似文献   

2.
Gao  Jiu-Ru  Chen  Wei  Xu  Jia-Jie  Liu  An  Li  Zhi-Xu  Yin  Hongzhi  Zhao  Lei 《计算机科学技术学报》2019,34(6):1185-1202

With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.

  相似文献   

3.
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.  相似文献   

4.
Because of its wide application, the subgraph matching problem has been studied extensively during the past decade. However, most existing solutions assume that a data graph is a vertex/edge-labeled graph (i.e., each vertex/edge has a simple label). These solutions build structural indices by considering the vertex labels. However, some real graphs contain rich-content vertices such as user profiles in social networks and HTML pages on the World Wide Web. In this study, we consider the problem of subgraph matching using a more general scenario. We build a structural index that does not depend on any vertex content. Based on the index, we design a holistic subgraph matching algorithm that considers the query graph as a whole and finds one match at a time. In order to further improve efficiency, we propose a “partial evaluation and assembly” framework to find subgraph matches over large graphs. Last but not least, our index has light maintenance overhead. Therefore, our method can work well on dynamic graphs. Extensive experiments on real graphs show that our method outperforms the state-of-the-art algorithms.  相似文献   

5.
Recently research has deeply investigated the problem of querying semi-structured data and data which can be represented by means of graphs (e.g. object-oriented data, XML data, etc.). Typically queries on graph-like data, called path queries, are expressed by means of regular expressions denoting paths in the graph. The result of a path query is the set of nodes reachable by means of a path expressed by a specified regular expression. In this paper we investigate the problem of extracting a subgraph satisfying a given property from a given graph representing some information. We propose a new form of queries, called graph queries, whose answers are (marked) graphs having a particular structure, extracted from the source graph. We show that a simple form of graph grammars can be profitably used to define graph queries. The result of a graph query, using a grammar G over a database D, is a marked subgraph of D ‘matching’ a graph derived from G. We consider different types of graph grammars which can be used to query graph-like data and consider their expressiveness and complexity.  相似文献   

6.
判断有向图上两个顶点之间是否存在一条路径是一个经典问题,而对于一些路由规划和图分析等实际应用,要求查找是否存在跳数受限的可达路径,这是一个变种的图可达查询问题.对于大图上跳数受限的查询算法,不仅仅要对大图查询的时间效率和空间效率进行权衡,而且还要利用跳数受限的特性进行优化.普通的可达查询算法存在小度数顶点索引项占用空间过多的问题,造成空间浪费严重.为此我们提出了一种面向跳数受限的2-hop部分索引方法,采用改进的索引方法并结合局部搜索,实现跳数受限的有效可达性查询.实验结果表明,在Orkut社交网络数据集上与已有算法相比,该算法索引空间节省了32%,同时查询时间略微增加,使得我们算法可以计算更大规模图的跳数受限可达问题.  相似文献   

7.
Recently, uncertain graph data management and mining techniques have attracted significant interests and research efforts due to potential applications such as protein interaction networks and social networks. Specifically, as a fundamental problem, subgraph similarity all-matching is widely applied in exploratory data analysis. The purpose of subgraph similarity all-matching is to find all the similarity occurrences of the query graph in a large data graph. Numerous algorithms and pruning methods have been developed for the subgraph matching problem over a certain graph. However, insufficient efforts are devoted to subgraph similarity all-matching over an uncertain data graph, which is quite challenging due to high computation costs. In this paper, we define the problem of subgraph similarity maximal all-matching over a large uncertain data graph and propose a framework to solve this problem. To further improve the efficiency, several speed-up techniques are proposed such as the partial graph evaluation, the vertex pruning, the calculation model transformation, the incremental evaluation method and the probability upper bound filtering. Finally, comprehensive experiments are conducted on real graph data to test the performance of our framework and optimization methods. The results verify that our solutions can outperform the basic approach by orders of magnitudes in efficiency.  相似文献   

8.
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost.  相似文献   

9.
We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a “partial evaluation and assembly” framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Based on properties of subgraph matching over a distributed graph, we introduce local partial match as partial answers in each fragment of RDF graph G. For assembly, we propose two methods: centralized and distributed assembly. We analyze our algorithms from both theoretically and experimentally. Extensive experiments over both real and benchmark RDF repositories of billions of triples confirm that our method is superior to the state-of-the-art methods in both the system’s performance and scalability.  相似文献   

10.
In this paper, we have developed a HiTi (Hierarchical MulTi) graph model for structuring large topographical road maps to speed up the minimum cost route computation. The HiTi graph model provides a novel approach to abstracting and structuring a topographical road map in a hierarchical fashion. We propose a new shortest path algorithm named SPAH, which utilizes HiTi graph model of a topographical road map for its computation. We give the proof for the optimality of SPAH. Our performance analysis of SPAH on grid graphs showed that it significantly reduces the search space over existing methods. We also present an in-depth experimental analysis of HiTi graph method by comparing it with other similar works on grid graphs. Within the HiTi graph framework, we also propose a parallel shortest path algorithm named ISPAH. Experimental results show that inter query shortest path problem provides more opportunity for scalable parallelism than the intra query shortest path problem.  相似文献   

11.
在SPARQL查询过程中,含有复杂结构的资源描述框架(RDF)图的查询效率低下。为此,通过分析几种RDF图的基本结构与RDF顶点的选择性,提出RDF三元组模式选择性(RTPS)——一种基于RDF顶点选择性的图结构切分规则,以提高面向RDF图的子图匹配效率。首先,根据谓词结构在数据图与查询图中的通性建立RDF相邻谓词路径(RAPP)索引,将数据图结构转化为传入-传出双向谓词路径结构以确定查询顶点的搜索空间,并加快顶点的过滤;接着,通过整数线性规划(ILP)问题计算建模将复杂RDF查询图结构分解为若干结构简单的查询子图,通过分析RDF顶点在查询图中的相邻子图结构与特征,确立查询顶点的选择性以确定最优切分方式;然后,通过RDF顶点选择性与相邻子图的结构特征来缩小查询顶点的搜索空间范围,并在数据图中找到符合条件的RDF顶点;最后,遍历数据图以找到与查询子图结构相匹配的子图结构,将得到的子图进行连接并将其作为查询结果输出。实验采用控制变量法,比较了RTPS、RDF子图匹配(RSM)、RDF-3X、GraSS与R3F的查询响应时间。实验结果充分表明,与其他4种方法相比,当查询图复杂度高于9时,RTPS的查询响应时间更短,具有更高的查询效率。  相似文献   

12.
针对k步可达性查询算法无法解决带距离约束的图可达性查询问题,提出基于参考节点嵌入的图可达性查询算法。首先,从所有节点中选出极少数有代表性的全局参考节点,预先计算所有节点与全局参考节点之间的最短路径距离;然后,采用最短路径树和范围最小值查询技术求得局部参考节点;接着,利用三角不等式关系得到查询点对距离范围;最后,根据查询条件中的距离值与查询点对距离范围上、下限值的大小关系,可快速得出可达性结论。针对社会关系网络和公路网络数据,将所提算法与Dijkstra算法、K-Reach算法进行实验对比测试。相较于K-Reach算法,其索引建立时间小4个数量级,其索引规模小2个数量级;相较于Dijkstra算法,在公路网络和社会关系网络中,直接得出可达性结论的比例分别为92%和78.6%,其查询时间大大缩短,分别降低了95.5%和92%。实验结果表明:所提算法能够通过使用较小的索引开销,实现在线查询计算复杂度的降低,可很好地解决既适用于有权图又适用于无权图带距离约束的可达性查询问题。  相似文献   

13.
现有的动态子图匹配研究中忽略了模式图中的时间信息,使用户难以得到想要查询的真实结果.针对这种情况,首先设计一种简洁的中间结果保存形式,将匹配结果直接在数据图中以图进行存储;接着改进边缘转换模型,当数据图有边插入/删除时对其快速增量维护并报告结果;最后根据边转换模型设计了一个时间尊重图模式匹配算法,提高了匹配搜索效率.对真实网络流量数据和综合社交流数据进行实验评估,结果表明算法能够有效减少图模式匹配的执行时间和空间花销.  相似文献   

14.
We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset. We construct a disk-based index on all the signatures via bulk loading. During query processing, a query graph is also mapped into its signature, and this signature is searched using the index by performing multiset operations. To improve the precision of exact subgraph matching, we develop a new scheme using the concept of line graphs. Through extensive evaluation on real and synthetic graph datasets, we demonstrate that our approach provides a scalable and efficient disk-based solution for a large database of small and medium-sized graphs.  相似文献   

15.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method.  相似文献   

16.
17.
XMin: Minimizing Tree Pattern Queries with Minimality Guarantee   总被引:1,自引:0,他引:1  
Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method.  相似文献   

18.
随着社交网络、生物信息以及web挖掘等应用的不断发展,图结构数据的存储和查询处理越来越得到重视。但是,针对顶点邻域非常密集的场合,如何提高此类顶点的查询效率,现有的研究相对较少。论文在分析了顶点密集领域数据的特点后,提出了一种对顶点密集邻域建立路径索引的策略,有效地解决了此类查询的效率。首先分析顶点密集的邻域的查询模式,并在这些模式上建立路径索引,然后采用B 树方法,对路径索引的存储、更新和查找方法进行了设计实现,最后,采用图数据库NEO4J为基础,对路径索引存储空间和查询性能进行了测试,测试表明,虽然路径索引会占用存储空间,但是能够提高特定的查询处理的性能。  相似文献   

19.
Search games are attractive for their correspondence with classical width parameters. For instance, the invisible search number (a.k.a. node search number) of a graph is equal to its pathwidth plus 1, and the visible search number of a graph is equal to its treewidth plus 1. The connected variants of these games ask for search strategies that are connected, i.e., at every step of the strategy, the searched part of the graph induces a connected subgraph. We focus on monotone search strategies, i.e., strategies for which every node is searched exactly once. The monotone connected visible search number of an n-node graph is at most O(logn) times its visible search number. First, we prove that this logarithmic bound is tight. Precisely, we prove that there is an infinite family of graphs for which the ratio monotone connected visible search number over visible search number is Ω(logn). Second, we prove that, as opposed to the non-connected variant of visible graph searching, “recontamination helps” for connected visible search. Precisely, we prove that, for any k4, there exists a graph with connected visible search number at most k, and monotone connected visible search number >k  相似文献   

20.
The ever increasing size of graphs makes them difficult to query and store. In this paper, we present Shrink, a compression method that reduces the size of the graph while preserving the distances between the nodes. The compression is based on the iterative merging of the nodes. During each merging, a system of linear equations is solved to define new edge weights in a way that the new weights have the least effect on the distances. Merging nodes continues until the desired size for the compressed graph is reached. The compressed graph, also known as the coarse graph, can be queried without decompression. As the complexity of distance-based queries such as shortest path queries is highly dependent on the size of the graph, Shrink improves the performance in terms of time and storage. Shrink not only provides the length of the shortest path but also identifies the nodes on the path. The approach has been applied to both weighted and unweighted graphs including road network, friendship network, collaboration network, web graph and social network. In the experiment, a road network with more than 2.5 million nodes is reduced to fifth while the average relative error is less than 1%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号