首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Popular distributed graph processing frameworks, such as Pregel and GraphLab, are based on the vertex-centric computation model, where users write their customized Compute function for each vertex to process the data iteratively. Vertices are evenly partitioned among the compute nodes, and the granularity of parallelism of the graph algorithm is normally tuned by adjusting the number of compute nodes. Vertex-centric model splits the computation into phases. Inside one specific phase, the computation proceeds as an embarrassingly parallel process, because no communication between compute nodes incurs. By default, current graph engine only handles one iteration of the algorithm in a phase. However, in this paper, we find that we can also tune the granularity of parallelism, by aggregating the computation of multiple iterations into one phase, which has a significant impact on the performance of the graph algorithm. In the ideal case, if all computations are handled in one phase, the whole algorithm turns into an embarrassingly parallel algorithm and the benefit of parallelism is maximized. Based on this observation, we propose two approaches, a function-based approach and a parameter-based approach, to automatically transform a Pregel algorithm into a new one with tunable granularity of parallelism. We study the cost of such transformation and the trade-off between the granularity of parallelism and the performance. We provide a new direction to tune the performance of parallel algorithms. Finally, the approaches are implemented in our graph processing system, N2, and we illustrate their performance using popular graph algorithms.  相似文献   

2.
We propose a parallel computation model, called cellular matrix model (CMM), to address large-size Euclidean graph matching problems in the plane. The parallel computation takes place by partitioning the plane into a regular grid of cells, each cell being affected to a single processor. Each processor operates on local data, starting from its cell location and extending its search to the neighborhood cells in a spiral search way. In order to deal with large-size problems, memory size and processor number are fixed as O(N), where N is the problem size. Then one key point is that closest point searching in the plane is performed in O(1) expected time for uniform or bounded distribution, for each processor independently. We define a generic loop that models the parallel projection between graphs and their matching, as executed by the many cells at a given level of computation granularity. To illustrate its efficacy and versatility, we apply the CMM, on GPU platforms, to two problems in image processing: superpixel segmentation and stereo matching energy minimization. Firstly, we propose an extended version of the well-known SLIC superpixel segmentation algorithm, which we call SPASM algorithm, by using a parallel 2D self-organizing map instead of k-means algorithm. Secondly, we investigate the idea of distributed variable neighborhood search, and propose a parallel search heuristic, called distributed local search (DLS), for global energy minimization of stereo matching problem. We evaluate the approach with regards to the state-of-the-art graph cut and belief propagation algorithms. For each problem, we argue that the parallel GPU implementation provides new competitive quality/time trade-offs, with substantial acceleration factors as the problem size increases.  相似文献   

3.
Eyal Amir 《Algorithmica》2010,56(4):448-479
This paper presents algorithms whose input is an undirected graph, and whose output is a tree decomposition of width that approximates the optimal, the treewidth of that graph. The algorithms differ in their computation time and their approximation guarantees. The first algorithm works in polynomial-time and finds a factor-O(log OPT) approximation, where OPT is the treewidth of the graph. This is the first polynomial-time algorithm that approximates the optimal by a factor that does not depend on n, the number of nodes in the input graph. As a result, we get an algorithm for finding pathwidth within a factor of O(log OPT⋅log n) from the optimal. We also present algorithms that approximate the treewidth of a graph by constant factors of 3.66, 4, and 4.5, respectively and take time that is exponential in the treewidth. These are more efficient than previously known algorithms by an exponential factor, and are of practical interest. Finding triangulations of minimum treewidth for graphs is central to many problems in computer science. Real-world problems in artificial intelligence, VLSI design and databases are efficiently solvable if we have an efficient approximation algorithm for them. Many of those applications rely on weighted graphs. We extend our results to weighted graphs and weighted treewidth, showing similar approximation results for this more general notion. We report on experimental results confirming the effectiveness of our algorithms for large graphs associated with real-world problems.  相似文献   

4.

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

  相似文献   

5.
We investigate data parallel techniques for belief propagation in acyclic factor graphs on multi-core systems. Belief propagation is a key inference algorithm in factor graph, a probabilistic graphical model that has found applications in many domains. In this paper, we explore data parallelism for basic operations over the potential tables in belief propagation. Data parallel techniques for these table operations are developed for shared memory platforms. We then propose a complete belief propagation algorithm using these table operations to perform exact inference in factor graphs. The proposed algorithms are implemented on state-of-the-art multi-socket multi-core systems with additional NUMA-aware optimizations. Our proposed algorithms exhibit good scalability using a representative set of factor graphs. On a four-socket Intel Westmere-EX system with 40 cores, we achieve 39.5 $\times $ speedup for the table operations and 39 $\times $ speedup for the complete algorithm using factor graphs with large potential tables.  相似文献   

6.
Abstract

This paper is concerned with the maximum cut problem in parallel on cubic graphs. New theoretical results characterizing the cardinality of the cut are presented. These results make it possible to design a simple combinatorial O(log n) time parallel algorithm, running on a CRCW P-RAM with O(n) processors. The approximation ratio achieved by the algorithm is 1·3 and improves the best known parallel approximation ratio, i.e. 2, in the special class of cubic graphs. The algorithm also guarantees that the size of the returned cut is at least ((9g ?3)/8?g)n, where g is the odd girth of the input graph. Experimental results round off the paper, showing that the solutions obtained in practice are likely to be much better than the theoretical lower bound.  相似文献   

7.
This paper addresses the problem of global graph alignment on supercomputer-class clusters. We define the alignment of two graphs, as a mapping of each vertex in the first graph to a unique vertex in the second graph so as to optimize a given similarity-based cost function.1 Using a state of the art serial algorithm for the computation of vertex similarity scores called Network Similarity Decomposition (NSD), we derive corresponding parallel formulations. Coupling this parallel similarity algorithm with a parallel auction-based bipartite matching technique, we obtain a highly efficient and scalable graph matching pipeline. We validate the performance of our integrated approach on a large parallel platform and on diverse graph instances (including Protein Interaction, Wikipedia and Web networks). Experimental results demonstrate that our algorithms scale to large machine configurations (thousands of cores) and problem instances, enabling the alignment of networks of sizes two orders of magnitude larger than reported in the current literature.  相似文献   

8.
We present an approach to designing cellular automata-based multiprocessor scheduling algorithms in which extracting knowledge about the scheduling process occurs. We consider the simplest case when a multiprocessor system is limited to two-processors. To design cellular automata corresponding to a given program graph, we propose a generic definition of program graph neighborhood, transparent to the various kinds, sizes, and shapes of program graphs. The cellular automata-based scheduler works in two modes: learning mode and operation mode. Discovered rules are typically suitable for sequential cellular automata working as a scheduler, while the most interesting and promising feature of cellular automata are their massive parallelism. To overcome difficulties in evolving parallel cellular automata rules, we propose using coevolutionary genetic algorithm. Discovered this way, rules enable us to design effective parallel schedulers. We present a number of experimental results for both sequential and parallel scheduling algorithms discovered in the context of a cellular automata-based scheduling system  相似文献   

9.
In this paper, we have developed a HiTi (Hierarchical MulTi) graph model for structuring large topographical road maps to speed up the minimum cost route computation. The HiTi graph model provides a novel approach to abstracting and structuring a topographical road map in a hierarchical fashion. We propose a new shortest path algorithm named SPAH, which utilizes HiTi graph model of a topographical road map for its computation. We give the proof for the optimality of SPAH. Our performance analysis of SPAH on grid graphs showed that it significantly reduces the search space over existing methods. We also present an in-depth experimental analysis of HiTi graph method by comparing it with other similar works on grid graphs. Within the HiTi graph framework, we also propose a parallel shortest path algorithm named ISPAH. Experimental results show that inter query shortest path problem provides more opportunity for scalable parallelism than the intra query shortest path problem.  相似文献   

10.
We study and improve the OBF technique [Barnat, J. and P.Moravec, Parallel algorithms for finding SCCs in implicitly given graphs, in: Proceedings of the 5th International Workshop on Parallel and Distributed Methods in Verification (PDMC 2006), LNCS (2007)], which was used in distributed algorithms for the decomposition of a partitioned graph into its strongly connected components. In particular, we introduce a recursive variant of OBF and experimentally evaluate several different implementations of it that vary in the degree of parallelism. For the evaluation we used synthetic graphs with a few large components and graphs with many small components. We also experimented with graphs that arise as state spaces in real model checking applications. The experimental results are compared with that of other successful SCC decomposition techniques [Orzan, S., “On Distributed Verification and Verified Distribution,” Ph.D. thesis, Free University of Amsterdam (2004); Fleischer, L.K., B. Hendrickson and A. Pinar, On identifying strongly connected components in parallel, in: Parallel and Distributed Processing, IPDPS Workshops, Lecture Notes in Computer Science 1800, 2000, pp. 505–511].  相似文献   

11.
In distributed shared memory multiprocessor systems, parallel tasks communicate through sharing memory data. As the system size increases, such communication cost becomes the main factor that limits the overall parallelism and performance. In this paper, we propose a new solution to the problem through judiciously managing the relevant resource, namely, the shared data and the interconnection network (IN) through which the sharing is carried out. In this approach, communication cost is minimized by means of data migration/allocation which is based on analyzing general layered task graphs, sharing behavior of parallel tasks, and network topology. Our method is not applicable for read only variables. Further, for the time being, the usefulness of the method is limited to multiprocessors where no cache coherence mechanism is implemented. Four typical interconnection topologies for multiprocessors are considered, namely, shared-bus, hierarchical-bus, 2-D mesh, and fat-tree structures. Efficient data allocation algorithms for each of the four network topologies are developed that make decision on data allocation/migration at the compile time. The complexity of one algorithm isO(np) for shared-bus andO(n2p) for the remaining three in a system withnprocessors executing ap-layer task graph for one shared variable. We have also given an algorithm to determine optimal allocation/migration scheme for multiple shared variables. However, the cost of the algorithm become prohibitive when the number of shared variables is high. Therefore, a heuristic of low complexity is suggested. The heuristic is optimal for some topologies.  相似文献   

12.
Algorithms used in data mining and bioinformatics have to deal with huge amount of data efficiently.In many applications,the data are supposed to have explicit or implicit structures.To develop efficient algorithms for such data,we have to propose possible structure models and test if the models are feasible.Hence,it is important to make a compact model for structured data,and enumerate all instances efficiently.There are few graph classes besides trees that can be used for a model.In this paper,we inves...  相似文献   

13.
知识图谱划分算法研究综述   总被引:6,自引:0,他引:6  
知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,...  相似文献   

14.
Wei  Xing  Hu  Huiqi  Duan  Huichao  Qian  Weining  Zhou  Aoying 《World Wide Web》2019,22(6):2561-2587

To support the large-scale analytic for Web applications, the backend distributed data management system must provide the service for accessing massive data. Thus, the scan operation becomes a critical step. To improve the performance of scan operation, modern data management systems usually rely on the simple partitioned parallelism. Under the partitioned parallelism, tables are consist of several partitions, and each scan operation can access multiple partitions separately. It is a simple and effective solution for a single scan operation. In this paper, we consider managing multiple scan operations together, where the situation is no longer straightforward. To address the problem, we propose the parallel strategy to schedule batched scan operations together beyond the simple partitioned parallelism. For the sake of performance, first, we utilize replications to increase the parallelism and propose an effective load balancing strategy over replication nodes based on linear programming. Second, we propose an effective chunk-based scheduling algorithm for multi-threading parallelism on each node to guarantee all threads have even workloads under a qualified cost model. Finally, we integrate our parallel scan strategy into an open-sourced distributed data management system. Experimental evaluation shows our parallel scan strategy significantly improves the performance of scan operation.

  相似文献   

15.
We present the first location-oblivious distributed unit disk graph coloring algorithm having a provable performance ratio of three (i.e. the number of colors used by the algorithm is at most three times the chromatic number of the graph). This is an improvement over the standard sequential coloring algorithm that has a worst case lower bound on its performance ratio of 4−3/k (for any k>2, where k is the chromatic number of the unit disk graph achieving the lower bound) (Tsai et al., in Inf. Process. Lett. 84(4):195–199, 2002). We present a slightly better worst case lower bound on the performance ratio of the sequential coloring algorithm for unit disk graphs with chromatic number 4. Using simulation, we compare our algorithm with other existing unit disk graph coloring algorithms.  相似文献   

16.
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.  相似文献   

17.
Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the task’s priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.  相似文献   

18.
Mining globally distributed frequent subgraphs in a single labeled graph   总被引:1,自引:0,他引:1  
Xing  Hui  Chen  Ah-Hwee   《Data & Knowledge Engineering》2009,68(10):1034-1058
Recent years have observed increasing efforts on graph mining and many algorithms have been developed for this purpose. However, most of the existing algorithms are designed for discovering frequent subgraphs in a set of labeled graphs only. Also, the few algorithms that find frequent subgraphs in a single labeled graph typically identify subgraphs appearing regionally in the input graph. In contrast, for real-world applications, it is commonly required that the identified frequent subgraphs in a single labeled graph should also be globally distributed. This paper thus fills this crucial void by proposing a new measure, termed G-Measure, to find globally distributed frequent subgraphs, called G-Patterns, in a single labeled graph. Specifically, we first show that the G-Patterns, selected by G-Measure, tend to be globally distributed in the input graph. Then, we present that G-Measure has the downward closure property, which guarantees the G-Measure value of a G-Pattern is not less than those of its supersets. Consequently, a G-Miner algorithm is developed for finding G-Patterns. Experimental results on four synthetic and seven real-world data sets and comparison with the existing algorithms demonstrate the efficacy of the G-Measure and the G-Miner for finding G-Patterns. Finally, an application of the G-Patterns is given.  相似文献   

19.
We consider the construction of sparse covers for planar graphs and other graphs that exclude a fixed minor. We present an algorithm that gives a cover for the γ-neighborhood of each node. For planar graphs, the cover has radius less than 16γ and degree no more than 18. For every n node graph that excludes a minor of a fixed size, we present an algorithm that yields a cover with radius no more than 4γ and degree O(logn). This is a significant improvement over previous results for planar graphs and for graphs excluding a fixed minor; in order to obtain clusters with radius O(γ), it was required to have the degree polynomial in n. Our algorithms are based on a recursive application of a basic routine called shortest-path clustering, which seems to be a novel approach to the construction of sparse covers. Since sparse covers have many applications in distributed computing, including compact routing, distributed directories, synchronizers, and Universal TSP, our improved cover construction results in improved algorithms for all these problems, for the class of graphs that exclude a fixed minor.  相似文献   

20.
Embar  Varun  Srinivasan  Sriram  Getoor  Lise 《Machine Learning》2021,110(7):1847-1866

Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complex aggregate graph queries (AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号