随着图数据的规模日益增大,出现大量以动态图数据为基础的分布式处理需求,划分问题在动态图数据分布式处理领域尤为重要. 对大规模动态图数据上的划分问题进行研究,根据图结构性质及动态图特点,提出并实现基于邻域的动态图分割算法. 算法分为静态切分和动态调整两个阶段,其中基于割边算法整合现有最优化策略提出了大规模图数据的静态切割算法. 在优化后的静态切割算法的基础上,根据图数据的动态扩张的特性提出动态分割算法. 根据迁移顶点所达到的最小负载值进行顶点迁移,并在此基础上进行性能及割边控制优化操作. 最后,改进算法在各类图数据集上进行了验证,验证的结果显示在平衡度和割边等指标上优化后的算法效果显著,提高了划分的合理性,并且在保证割边不增加的情况下提高了图分割的平衡度.  相似文献   

图划分是大规模分布式图处理的首要工作,对图应用的存储、查询、处理和挖掘起基础支撑作用.随着图数据规模的不断扩大,真实世界中的图表现出动态性.如何对动态图进行划分,已成为目前图划分研究的热点问题.从不同动态图划分算法的关注点和特点出发,系统性地介绍当前可用于解决动态图划分问题的各类算法,包括流式图划分算法、增量式图划分算法和图重划分算法.首先介绍图划分的3种不同的划分策略及问题定义、图的两种不同的动态性来源以及动态图划分问题;然后介绍3种不同的流式图划分算法,包括基于Hash的划分算法、基于邻居分布的划分算法以及基于流的优化划分算法;其次介绍单元素增量式划分和批量增量式划分这两种不同的增量式图划分算法;再次,分别介绍针对图结构动态的重划分算法和针对图计算动态的重划分算法;最后,在对已有方法分析和比较的基础上,总结目前动态图划分面临的主要挑战,提出相应的研究问题.  相似文献   

分布式存储是解决大规模数据存储的一种比较有效的方法,而数据分割是实现分布式存储的前提。面对不断增长的RDF数据,提出一种基于双目标优化的RDF图分割算法(RDF Graph Partitioning algorithm based on Double Objective Optimization,RGPDOO)。RGPDOO将边割和分割平衡两项图分割指标融合到一个目标函数,并依据此目标函数,实现了RDF图的静态和动态分割。其中静态图分割通过对图进行初始划分,将图中顶点分成内核顶点、交叉顶点和自由顶点三类。然后通过计算目标函数增益对交叉和自由顶点进行分配。动态图分割部分,针对RDF元组的插入和删除给出相应的解决方案。同时,为了满足图分割目标,算法每隔一段时间[T]会根据子图的平衡性和紧密性进行一次动态调整。实验选择合成和真实数据集进行测试,并分别与几种通用的静态和动态图分割算法进行比较。实验结果表明提出的算法能够有效地实现RDF图的静态和动态分割。  相似文献   

图划分是分布式图计算中的一项基础工作, 其作用是将大规模图进行划分并分配到集群中的不同机器上. 图划分的质量对分布式图计算的性能有很大的影响, 其目标是降低负载平衡和最小化边割. 如今, 现实中的图数据通常呈动态增长态势, 这就需要一种能够处理动态增量图的划分方法, 在图数据动态增长的过程中确保划分的质量不受影响. 目前虽然有一些动态图划分算法被提出, 但它们不能同时专注于实时处理动态变化和获得高质量的划分结果. 提出基于顶点组重分配的动态增量图划分算法(ED-IDGP)来解决大规模动态增量图的划分问题. 在ED-IDGP算法中, 设计实时处理4种不同单元更新类型的动态处理器, 并在每次处理完单元更新后通过在分区发生动态变化的附近执行局部优化器进一步提高图划分的质量. 在ED-IDGP的局部优化器中, 利用基于改进标签传播算法的顶点组搜索策略搜索顶点组, 并利用提出的顶点组移动增益公式衡量最有益的顶点组, 将该顶点组移动到目标分区中做优化. 在真实数据集上从不同的角度和度量指标评估了ED-IDGP算法的性能和效率.  相似文献   

基于因子图模型的动态图半监督聚类算法   总被引:1,自引:1,他引:0  
针对动态图的聚类主要存在着两点不足:首先, 现有的经典聚类算法大多从静态图分析的角度出发, 无法对真实网络图持续演化的特性进行有效建模, 亟待对动态图的聚类算法展开研究, 通过对不同时刻图快照的聚类结构进行分析进而掌握图的动态演化情况.其次, 真实网络中可以预先获取图中部分节点的聚类标签, 如何将这些先验信息融入到动态图的聚类结构划分中, 从而向图中的未标记节点分配聚类标签也是本文需要解决的问题.为此, 本文提出进化因子图模型(Evolution factor graph model, EFGM)用于解决动态图节点的半监督聚类问题, 所提EFGM不仅可以捕获动态图的节点属性和边邻接属性, 还可以捕获节点的时间快照信息.本文对真实数据集进行实验验证, 实验结果表明EFGM算法将动态图与先验信息融合到一个统一的进化因子图框架中, 既使得聚类结果满足先验知识, 又契合动态图的整体演化规律, 有效验证了本文方法的有效性.  相似文献   

最小生成树(minimum spanning tree,MST)是图论中最为经典算法之一.基于MST结构的聚类、分类和最短路径查询等复杂图算法,在效率和结果质量方面均有显著提高.然而,随着互联网的迅猛发展,图数据规模也变得越来越大,包含千万甚至上亿个顶点的大图数据越发常见.因此,如何在大图数据上实现查询处理和数据挖掘算法已成为亟待解决的问题之一.除此之外,由于大图数据的动态性特征,如何动态地维护算法结果也势必成为最受关注的问题之一.针对目前集中式的最小生成树算法无法解决海量和动态图数据的问题,首先提出了分区Prim(partition Prim,PP)算法,基于此提出了顶点驱动的并行MST算法——PB(PP Boru。vka)算法,并论证了PB算法的正确性.另外,基于MapReduce和BSP框架实现了PB算法.针对只删除动态图特征,提出了MST维护算法,以实现高效的增量计算.对提出的计算和维护算法进行了代价分析和比较.最后,使用真实和模拟数据集,验证了PB算法和维护算法的有效性、高效性和可扩展性.  相似文献   

针对GN算法在发现重叠社区时存在的不足,以及为了降低算法时间复杂度,提出一种基于网络图中连边相似度划分连边集的重叠社区发现算法EGN。算法依据网络图的连边集进行划分,每一条边被划分到某个特定的社区,而一个节点可以关联多条连边,因此节点可以被划分到不同的社区,从而发现重叠社区。EGN算法首先需要构造网络节点之间连边关系的边图;然后根据边图中节点的关系计算网络图中连边的相似度,在节点之间相似度的基础上提出了连边之间相似度的计算方法;再按照相似度由小到大对边图删除边,构建出边图的树状图。树状图的每一层对应网络的一个划分,采用划分密度函数来衡量划分的质量,以此寻找最优的划分。最后将算法应用到Zachary空手道俱乐部网络中,并与GN算法进行对比,实验结果表明EGN算法能够很好地发现重叠社区。  相似文献   

针对现有的产品结构模块划分方法的不足,将复杂网络理论中的社团结构 发现方法应用于产品结构模块的划分,提出了一种新的结构模块划分方法。产品的结构单元 作为网络的节点,有关联的结构单元对应的节点之间为网络的边,从而构建产品结构的网络 图,使用复杂网路理论中的社团结构发现方法—— GN 算法实现结构模块的划分。论文阐述 了基于GN 算法的模块划分的方法与步骤,在此基础上以汽车发动机的结构模块划分为例验 证了该方法的有效性和实用性,并对模块划分结果进行了分析,最后指出了今后进一步研究 的内容与方法。  相似文献   

对单体系统进行微服务划分能有效缓解单体架构中系统冗余、难以维护等问题,但是现有的微服务划分方法未能充分利用微服务架构的属性信息,导致服务划分结果的合理性不高.文中给出了一种基于微服务架构的服务划分方法.该方法通过系统服务与属性的关联信息来构建实体-属性关系图,然后结合微服务架构的特征信息与目标系统的需求信息制定服务划分规则,量化两类顶点之间的关联信息,生成实体-属性加权图,最后应用加权的GN算法自动地实现系统的微服务划分.实验结果表明,该方法在服务划分的时效性上有较大提升,并且生成的微服务划分方案在评估指标上的表现更好.  相似文献   

针对基于接收信号强度的无线传感器网络节点定位算法精度低的问题,提出一种基于Voronoi图划分的节点模糊信息定位算法。根据锚节点个数对定位区域进行Voronoi图划分,将整个定位区域划分为不同的Voronoi区域,同时获得各个Voronoi区域的顶点坐标。使用高斯滤波方法筛选出可以作为参考节点的顶点坐标,通过顶点坐标和锚节点联合定位未知节点。利用模糊信息定位方法计算出未知节点的最终位置。实验结果表明,相比M ANLFI算法和FINL-DT算法,该算法能够有效提高节点定位精度,降低网络能耗。  相似文献   

针对大图结构特征如何影响划分效果这一问题,提出一种通过顶点度分布特征来描述大图结构特征的方法。首先,基于真实的图数据产生若干顶点数和边数相同、但结构特征不同的仿真数据集,通过实验计算真实图与仿真图之间的相似度,证明该方法对描述真实大图结构特征的有效性。然后,通过Hash和点对交换划分算法,验证图结构特征与划分效果之间的关系。当点对交换划分算法执行到5万次时,划分一个有6301个顶点和20777条边的真实图其交叉边数比Hash划分算法降低了54.32%,划分仿真图数据集中结构特征差异明显的两个图时,交叉边数分别为6233和316。实验结果表明,点对交换划分算法能够减少交叉边数,图的顶点度分布差异越大,划分后交叉边数越少,划分效果越好,因此大图结构特征影响其划分效果,这为建立图的结构特征与划分效果之间的关系模型研究奠定了基础。  相似文献   

知识图谱划分算法研究综述   总被引:6,自引:0,他引:6  
知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,如何对其进行划分已成为目前知识图谱研究的热点问题.从知识图谱和图划分的定义出发,系统性地介绍当前知识图谱数据划分的各类算法,包括基本、多级、流式、分布式和其他类型图划分算法.首先,介绍4种基本图划分算法:谱划分算法、几何划分算法、分支定界算法、KL及其衍生算法,这类算法通常用于小规模图数据或作为其他划分算法的一部分;然后,介绍多级图划分算法,这类算法对图粗糙化后进行划分再投射回原始图,根据粗糙化过程分为基于匹配的算法和基于聚合的算法;其次,描述3种流式图划分算法,这类算法将顶点或边加载为序列后进行划分,包括Hash算法、贪心算法、Fennel算法,以及这3种算法的衍生算法;再次,介绍以KaPPa、JA-BE-JA和轻量级重划分为代表的分布式图划分算法及它们的衍生算法;同时,在其他类型图划分算法中,介绍近年来新兴的2种图划分算法:标签传播算法和基于查询负载的算法.通过在合成与真实知识图谱数据集上的丰富实验,比较了5类知识图谱代表性划分算法在划分效果、查询处理与图数据挖掘方面的性能差异,分析实验结果并推广到推理层面,获得了基于实验的知识图谱划分算法性能评价结论.最后,在对已有方法分析和比较的基础上,总结目前知识图谱数据划分面临的主要挑战,提出相应的研究问题,并展望未来的研究方向.  相似文献   

A novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph 𝒢 with arc capacities assigned to reflect the similarity between the linked vertices. Clustering is achieved by removing arcs of 𝒢 to form mutually exclusive subgraphs such that the largest inter-subgraph maximum flow is minimized. For graphs of moderate size (~ 2000 vertices), the optimal solution is obtained through partitioning a flow and cut equivalent tree of 𝒢, which can be efficiently constructed using the Gomory-Hu algorithm (1961). However for larger graphs this approach is impractical. New theorems for subgraph condensation are derived and are then used to develop a fast algorithm which hierarchically constructs and partitions a partially equivalent tree of much reduced size. This algorithm results in an optimal solution equivalent to that obtained by partitioning the complete equivalent tree and is able to handle very large graphs with several hundred thousand vertices. The new clustering algorithm is applied to the image segmentation problem. The segmentation is achieved by effectively searching for closed contours of edge elements (equivalent to minimum cuts in 𝒢), which consist mostly of strong edges, while rejecting contours containing isolated strong edges. This method is able to accurately locate region boundaries and at the same time guarantees the formation of closed edge contours  相似文献   

In this paper, we consider a graph problem on a connected weighted undirected graph, called the searchlight guarding problem. Our problem is an extension of so-called graph searching/guarding problem by considering the time slot parameter in addition to the traditional building cost. Suppose that there is a fugitive who moves along the edges of the graph at any speed. We want to place a set of searchlights at the vertices to search the edges of the graph and capture the fugitive. It costs some building cost to place a searchlight at some vertex. The searchlight guarding problem is to allocate a set S of searchlights at the vertices such that the total costs of the vertices in S is minimized. If there is more than one set of searchlights with the minimum building cost, then find the one with the minimum searching time, that is, the time slots needed to capture the fugitive is the minimum. The problem is known to be NP-hard on weighted bipartite graphs, split graphs, and chordal graphs; and it is linear time solvable on weighted trees and interval graphs. In this paper, an algorithm is designed to solve the problem on weighted two-terminal series-parallel graphs. It works on the parsing tree structure of the given two-terminal series-parallel graph. The algorithm is divided into two phases. In the phase one, we first extract some useful properties of optimal solutions. Employing these properties, an algorithm is designed to find the set of searchlights with the minimum guarding cost and to assign the searching directions of all edges by the dynamic programming strategy. In the phase two, the searched time slots of all edges are determined by the breadth-first-search from the root of the parsing tree. The time complexities of both phases are linear. Thus, our algorithm is time optimal. Received: 12 March 1996 / 27 May 1997  相似文献   

针对以二分图形式发布的社会网络隐私泄露问题,提出了一种面向敏感边识别攻击的社会网络二分图匿名方法。在已有k-安全分组的理论基础上,结合二分图的稀疏性和敏感边识别攻击形式,分别提出了正单向、逆单向以及完全(c1,c2)-安全性原则,并在此基础上,形式化地定义了一类抗敏感边识别攻击的社会网络二分图安全匿名问题;同时,还提出了一种基于k-频繁子图聚类的二分图划分算法和一种基于二分图(c1,c2)-安全性的匿名算法来保证发布二分图的安全性。实验结果表明,该算法在与已有方法相当时间开销的前提下,能产生更小的信息损失度,有效地抵制了敏感边识别攻击,实现了二分图的安全发布。  相似文献   

This paper proposes a new method for finding principal curves from data sets. Motivated by solving the problem of highly curved and self-intersecting curves, we present a bottom-up strategy to construct a graph called a principal graph for representing a principal curve. The method initializes a set of vertices based on principal oriented points introduced by Delicado, and then constructs the principal graph from these vertices through a two-layer iteration process. In inner iteration, the kernel smoother is used to smooth the positions of the vertices. In outer iteration, the principal graph is spanned by minimum spanning tree and is modified by detecting closed regions and intersectional regions, and then, new vertices are inserted into some edges in the principal graph. We tested the algorithm on simulated data sets and applied it to image skeletonization. Experimental results show the effectiveness of the proposed algorithm.  相似文献   

The visualization of dynamic graphs demands visually encoding at least three major data dimensions: vertices, edges, and time steps. Many of the state‐of‐the‐art techniques can show an overview of vertices and edges but lack a data‐scalable visual representation of the time aspect. In this paper, we address the problem of displaying dynamic graphs with a thousand or more time steps. Our proposed interleaved parallel edge splatting technique uses a time‐to‐space mapping and shows the complete dynamic graph in a static visualization. It provides an overview of all data dimensions, allowing for visually detecting time‐varying data patterns; hence, it serves as a starting point for further data exploration. By applying clustering and ordering techniques on the vertices, edge splatting on the links, and a dense time‐to‐space mapping, our approach becomes visually scalable in all three dynamic graph data dimensions. We illustrate the usefulness of our technique by applying it to call graphs and US domestic flight data with several hundred vertices, several thousand edges, and more than a thousand time steps.  相似文献   

A new approach to the problem of graph and subgraph isomorphism detection from an input graph to a database of model graphs is proposed in this paper. It is based on a preprocessing step in which the model graphs are used to create a decision tree. At run time, subgraph isomorphisms are detected by means of decision tree traversal. If we neglect the time needed for preprocessing, the computational complexity of the new graph algorithm is only polynomial in the number of input graph vertices. In particular, it is independent of the number of model graphs and the number of edges in any of the graphs. However, the decision tree is of exponential size. Several pruning techniques which aim at reducing the size of the decision tree are presented. A computational complexity analysis of the new method is given and its behavior is studied in a number of practical experiments with randomly generated graphs.  相似文献   

Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from suboptimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.  相似文献   

