首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
在SPARQL查询过程中,含有复杂结构的资源描述框架(RDF)图的查询效率低下。为此,通过分析几种RDF图的基本结构与RDF顶点的选择性,提出RDF三元组模式选择性(RTPS)——一种基于RDF顶点选择性的图结构切分规则,以提高面向RDF图的子图匹配效率。首先,根据谓词结构在数据图与查询图中的通性建立RDF相邻谓词路径(RAPP)索引,将数据图结构转化为传入-传出双向谓词路径结构以确定查询顶点的搜索空间,并加快顶点的过滤;接着,通过整数线性规划(ILP)问题计算建模将复杂RDF查询图结构分解为若干结构简单的查询子图,通过分析RDF顶点在查询图中的相邻子图结构与特征,确立查询顶点的选择性以确定最优切分方式;然后,通过RDF顶点选择性与相邻子图的结构特征来缩小查询顶点的搜索空间范围,并在数据图中找到符合条件的RDF顶点;最后,遍历数据图以找到与查询子图结构相匹配的子图结构,将得到的子图进行连接并将其作为查询结果输出。实验采用控制变量法,比较了RTPS、RDF子图匹配(RSM)、RDF-3X、GraSS与R3F的查询响应时间。实验结果充分表明,与其他4种方法相比,当查询图复杂度高于9时,RTPS的查询响应时间更短,具有更高的查询效率。  相似文献   

2.
关皓元  朱斌  李冠宇  赵玲 《计算机应用》2018,38(7):1898-1904
针对在SPARQL查询处理中,随着查询图结构逐渐复杂而导致基于图的查询效率愈发低下的问题,通过分析几种资源描述框架(RDF)图的基本结构,提出了一种基于查询图结构切分的子图匹配方法——RSM。首先,将查询图切分为若干结构简单的查询子图,并通过相邻谓词结构索引来定义查询图节点的搜索空间;然后,通过相邻子图结构来缩小搜索空间范围,在数据图中根据搜索空间中的搜索范围找到符合的子图结构;最后,将得到的子图进行连接并作为查询结果输出。将RSM与RDF-3X、R3F、GraSS等主流查询方法作比较,对比了各方法在不同数据集上对于复杂程度不同的查询图的查询响应时间。实验结果充分表明,与其他3种方法相比,在处理结构复杂的查询图时,RSM的查询响应时间更短,具有更高的查询效率。  相似文献   

3.
Mining globally distributed frequent subgraphs in a single labeled graph   总被引:1,自引:0,他引:1  
Xing  Hui  Chen  Ah-Hwee   《Data & Knowledge Engineering》2009,68(10):1034-1058
Recent years have observed increasing efforts on graph mining and many algorithms have been developed for this purpose. However, most of the existing algorithms are designed for discovering frequent subgraphs in a set of labeled graphs only. Also, the few algorithms that find frequent subgraphs in a single labeled graph typically identify subgraphs appearing regionally in the input graph. In contrast, for real-world applications, it is commonly required that the identified frequent subgraphs in a single labeled graph should also be globally distributed. This paper thus fills this crucial void by proposing a new measure, termed G-Measure, to find globally distributed frequent subgraphs, called G-Patterns, in a single labeled graph. Specifically, we first show that the G-Patterns, selected by G-Measure, tend to be globally distributed in the input graph. Then, we present that G-Measure has the downward closure property, which guarantees the G-Measure value of a G-Pattern is not less than those of its supersets. Consequently, a G-Miner algorithm is developed for finding G-Patterns. Experimental results on four synthetic and seven real-world data sets and comparison with the existing algorithms demonstrate the efficacy of the G-Measure and the G-Miner for finding G-Patterns. Finally, an application of the G-Patterns is given.  相似文献   

4.
Subsea: an efficient heuristic algorithm for subgraph isomorphism   总被引:1,自引:0,他引:1  
We present a novel approach to the problem of finding all subgraphs and induced subgraphs of a (target) graph which are isomorphic to another (pattern) graph. To attain efficiency we use a special representation of the pattern graph. We also combine our search algorithm with some known bisection algorithms. Experimental comparison with other algorithms was performed on several types of graphs. The comparison results suggest that the approach provided here is most effective when all instances of a subgraph need to be found.  相似文献   

5.
加权最大频繁子图挖掘算法的研究   总被引:2,自引:1,他引:1       下载免费PDF全文
如何从大量的图中挖掘出令人感兴趣的子图模式已经成为数据挖掘领域研究的热点之一。传统的频繁子图挖掘方法对满足最小支持度阈值的子图同等对待,但在真实数据库中不同的子图往往具有不同的重要程度。为解决上述问题,提出了一种深度优先的挖掘加权最大频繁子图的新算法。首先给出了一种新的用于计算图的邻接矩阵规范编码的结点排序策略,大大降低了求图规范编码的复杂度,并可以加速子图规范编码匹配的速度。其次,给出了加权最大频繁子图的定义,不仅可以找出较为重要的最大频繁子图,而且可以使挖掘结果同样具有反单调性,从而可加速剪枝。实验结果表明,提出的算法不仅可以有效地减少挖掘结果的数量,而且具有较高的效率。  相似文献   

6.
7.
RDF is a knowledge representation language dedicated to the annotation of resources within the framework of the semantic web. Among the query languages for RDF, SPARQL allows querying RDF through graph patterns, i.e., RDF graphs involving variables. Other languages, inspired by the work in databases, use regular expressions for searching paths in RDF graphs. Each approach can express queries that are out of reach of the other one. Hence, we aim at combining these two approaches. For that purpose, we define a language, called PRDF (for “Path RDF”) which extends RDF such that the arcs of a graph can be labeled by regular expression patterns. We provide PRDF with a semantics extending that of RDF, and propose a correct and complete algorithm which, by computing a particular graph homomorphism, decides the consequence between an RDF graph and a PRDF graph. We then define the PSPARQL query language, extending SPARQL with PRDF graph patterns and complying with RDF model theoretic semantics. PRDF thus offers both graph patterns and path expressions. We show that this extension does not increase the computational complexity of SPARQL and, based on the proposed algorithm, we have implemented a correct and complete PSPARQL query engine.  相似文献   

8.
We study the classical edge-searching pursuit-evasion problem where a number of pursuers have to clear a given graph of fast-moving evaders despite poor visibility, for example, where robots search a cave system to ensure that no terrorists are hiding in it. We study when polynomial-time algorithms exist to determine how many robots are needed to clear a given graph (minimum robot problem) and how a given number of robots should move on the graph to clear it with either a minimum sum of their travel distances (minimum distance problem) or minimum task-completion time (minimum time problem). The robots cannot clear a graph if the vertex connectivity of some part of the graph exceeds the number of robots. Researchers therefore focus on graphs whose subgraphs can always be cut at a limited number of vertices, that is, graphs of low treewidth, typically trees. We describe an optimal polynomial-time algorithm, called CLEARTHETREE, that is shorter and algorithmically simpler than the state-of-the-art algorithm for the minimum robot problem on unit-width unit-length trees. We then generalize prior research to both unit-width arbitrary-length and unit-length arbitrary-width graphs and derive both algorithms and time complexity results for a variety of graph topologies. Pursuit-evasion problems on the former graphs are generally simpler than pursuit-evasion problems on the latter graphs. For example, the minimum robot and distance problems are solvable in polynomial time on unit-width arbitrary-length trees but NP-hard on unit-length arbitrary-width trees.  相似文献   

9.
An undirected or a symmetric graph consists of a set of nodes and a set of nonoriented edges connecting between pairs of nodes. In widely differing disciplines of science and engineering, symmetric graphs find important uses. In many of these application areas, an often encountered problem is that of finding all the maximal complete subgraphs of a symmetric graph. In this paper, borrowing the concept of strong connectedness in a nonsymmetric graph, the idea of minimally strongly connected (MSC) and maximal minimally strongly connected (MMSC) subgraphs in a symmetric graph is introduced. The MMSC subgraphs play a kind of role identical to that played by maximal complete subgraphs in symmetric graphs. Many important properties of MMSC subgraphs are discussed in the paper, and an explicit, computer-oriented algorithm is developed for finding all the MMSC subgraphs, given an undirected graph.  相似文献   

10.
A new algorithm for error-tolerant subgraph isomorphism detection   总被引:3,自引:0,他引:3  
We propose a new algorithm for error-correcting subgraph isomorphism detection from a set of model graphs to an unknown input graph. The algorithm is based on a compact representation of the model graphs. This representation is derived from the set of model graphs in an off-line preprocessing step. The main advantage of the proposed representation is that common subgraphs of different model graphs are represented only once. Therefore, at run time, given an unknown input graph, the computational effort of matching the common subgraphs for each model graph onto the input graph is done only once. Consequently, the new algorithm is only sublinearly dependent on the number of model graphs. Furthermore, the new algorithm can be combined with a future cost estimation method that greatly improves its run-time performance  相似文献   

11.
The ability to find and count subgraphs of a given network is an important non trivial task with multidisciplinary applicability. Discovering network motifs or computing graphlet signatures are two examples of methodologies that at their core rely precisely on the subgraph counting problem. Here we present the g-trie, a data-structure specifically designed for discovering subgraph frequencies. We produce a tree that encapsulates the structure of the entire graph set, taking advantage of common topologies in the same way a prefix tree takes advantage of common prefixes. This avoids redundancy in the representation of the graphs, thus allowing for both memory and computation time savings. We introduce a specialized canonical labeling designed to highlight common substructures and annotate the g-trie with a set of conditional rules that break symmetries, avoiding repetitions in the computation. We introduce a novel algorithm that takes as input a set of small graphs and is able to efficiently find and count them as induced subgraphs of a larger network. We perform an extensive empirical evaluation of our algorithms, focusing on efficiency and scalability on a set of diversified complex networks. Results show that g-tries are able to clearly outperform previously existing algorithms by at least one order of magnitude.  相似文献   

12.
基于频繁子树挖掘算法中的前缀节点思想,将模式图分为图核—分支—连接向量三个部分,提出了CBE算法。对在分支上扩展得到的候选模式图,CBE算法能够在常数时间内完成规范化判定。通过实验证明CBE算法的子图挖掘效率有显著提高。  相似文献   

13.
In this work we address the Anonymous Subgraph Problem (ASP). The problem asks to decide whether a directed graph contains anonymous subgraphs of a given family. This problem has a number of practical applications and here we describe three of them (Secret Santa Problem, anonymous routing, robust paths) that can be formulated as ASPs. Our main contributions are (i) a formalization of the anonymity property for a generic family of subgraphs, (ii) an algorithm to solve the ASP in time polynomial in the size of the graph under a set of conditions, and (iii) a thorough evaluation of our algorithms using various tests based both on randomly generated graphs and on real-world instances.  相似文献   

14.
Maximal biclique (also known as complete bipartite) subgraphs can model many applications in Web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without self-loops, we prove that 1) the number of closed patterns in the adjacency matrix of G is even, 2) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G, and 3) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular, for enumerating those maximal bicliques with a large size, where m and n. are the number of edges and vertices of the graph, respectively. We evaluate the high efficiency of our algorithm by comparing it to state- of-the-art algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a real-life protein interaction network. In this paper, we also prove that if self-loops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even, but the maximal bicliques are exactly the same as those of the graph after removing all the self-loops.  相似文献   

15.
Maximal prime subgraph decomposition of Bayesian networks   总被引:1,自引:0,他引:1  
The authors present a method for decomposition of Bayesian networks into their maximal prime subgraphs. The correctness of the method is proven and results relating the maximal prime subgraph decomposition (MPD) to the maximal complete subgraphs of the moral graph of the original Bayesian network are presented. The maximal prime subgraphs of a Bayesian network can be organized as a tree which can be used as the computational structure for LAZY propagation. We also identify a number of tasks performed on Bayesian networks that can benefit from MPD. These tasks are: divide and conquer triangulation, hybrid propagation algorithms combining exact and approximative inference techniques, and incremental construction of junction trees. We compare the proposed algorithm with standard algorithms for decomposition of undirected graphs into their maximal prime subgraphs. The discussion shows that the proposed algorithm is simpler, more easy to comprehend, and it has the same complexity as the standard algorithms.  相似文献   

16.
郑志蕴  刘博李伦  王振飞 《计算机科学》2015,42(7):234-239, 249
随着语义网数据的海量涌现,人们更加关注RDF图的数据查询效率,通过关键词匹配直接查询RDF数据图成为一个研究热点。针对关键词查询中普遍存在的结果冗余与偏离等问题,提出了一种基于关键词的RDF数据图查询模型。该模型首先采用提出的基于迭代的图查询算法(ISGR)对所查询关键词进行子图匹配,得到唯一且最大的结果子图集合;然后根据关键词图与结果子图之间的结构信息,利用统计语言模型,给出了一种结果子图排序方法(SimLM)。对比实验表明,提出的查询模型及排序方法在一致性和相关性方面的性能优于传统模型。  相似文献   

17.
知识图谱划分算法研究综述   总被引:6,自引:0,他引:6  
知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,如何对其进行划分已成为目前知识图谱研究的热点问题.从知识图谱和图划分的定义出发,系统性地介绍当前知识图谱数据划分的各类算法,包括基本、多级、流式、分布式和其他类型图划分算法.首先,介绍4种基本图划分算法:谱划分算法、几何划分算法、分支定界算法、KL及其衍生算法,这类算法通常用于小规模图数据或作为其他划分算法的一部分;然后,介绍多级图划分算法,这类算法对图粗糙化后进行划分再投射回原始图,根据粗糙化过程分为基于匹配的算法和基于聚合的算法;其次,描述3种流式图划分算法,这类算法将顶点或边加载为序列后进行划分,包括Hash算法、贪心算法、Fennel算法,以及这3种算法的衍生算法;再次,介绍以KaPPa、JA-BE-JA和轻量级重划分为代表的分布式图划分算法及它们的衍生算法;同时,在其他类型图划分算法中,介绍近年来新兴的2种图划分算法:标签传播算法和基于查询负载的算法.通过在合成与真实知识图谱数据集上的丰富实验,比较了5类知识图谱代表性划分算法在划分效果、查询处理与图数据挖掘方面的性能差异,分析实验结果并推广到推理层面,获得了基于实验的知识图谱划分算法性能评价结论.最后,在对已有方法分析和比较的基础上,总结目前知识图谱数据划分面临的主要挑战,提出相应的研究问题,并展望未来的研究方向.  相似文献   

18.
19.
一种高效的RDF图模式匹配算法   总被引:5,自引:0,他引:5  
随着越来越多的信息被表示为RDF格式,如何高效地对RDF信息进行分发和过滤成为一个重要的问题·在语义Web环境下的信息分发系统中,输入的RDF信息需要和大量的用户订阅条件进行匹配,而用户的订阅条件可以被表示为RDF图模式·根据RDF图的特点,并对其增加了一些约束,设计了一种新的RDF图模式匹配算法·实验结果表明,该算法的匹配效率远远高于传统的图模式匹配算法·  相似文献   

20.
We represent a new method for finding all connected maximal common subgraphs in two graphs which is based on the transformation of the problem into the clique problem. We have developed new algorithms for enumerating all cliques that represent connected maximal common subgraphs. These algorithms are based on variants of the Bron–Kerbosch algorithm. In this paper we explain the transformation of the maximal common subgraph problem into the clique problem. We give a short summary of the variants of the Bron–Kerbosch algorithm in order to explain the modification of that algorithm such that the detected cliques represent connected maximal common subgraphs. After introducing and proving several variants of the modified algorithm we discuss the runtimes for all variants by means of random graphs. The results show the drastical reduction of the runtimes for the new algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号