首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Sep: A Fixed Degree Regular Network for Massively Parallel Systems   总被引:4,自引:0,他引:4  
We propose a family of regular Cayley network graphs of degree three based on permutation groups for design of massively parallel systems. These graphs are shown to be based on the shuffle exchange operations, to have logarithmic diameter in the number of vertices, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose a simple routing algorithm. These graphs are shown to be able to efficiently simulate or embed other permutation group based graphs; thus they seem to be very attractive for VLSI implementation and for applications requiring bounded number of I/O ports as well as to run existing applications for other permutation group based network architectures.  相似文献   

随着集成电路复杂度的提高以及产品投放市场的周期要求越来越短,传统的集成电路设计方法不再满足要求。因此需要开发基于硬件平台的系统,此系统通过可逻辑编程以及结构重组可以完成同一领域的一系列操作。同一领域的各个功能存在共性,可以通过研究这一领域的功能的共性,找出一些固定模块嵌入在系统中,从而在芯片面积有限的条件下提高芯片的利用率。这些模块的数量可能非常多,并且功能之间可能还有重复,因此模块选择问题成为设计者关注的焦点。直接求解此优化问题非常困难,需借助图论这一有利的数学工具解决此问题。但此问题是NP完备的,因此提出了针对两种特殊情况的算法,并分析了求解一般问题的一启发式算法,此算法的计算复杂性为O(N2s×k)。  相似文献   

In this paper, three new node-ranking schemes for the star graph are presented and evaluated. These node-ranking schemes efficiently “embed” grids, pipelines, and reconfigurable multiple ring networks (cases of torus networks). These schemes improve similar known results on the star graph. They also allow efficient mapping of a wide class of algorithms into the star graph and hence facilitating further testing for the viability of the star graph as a potential interconnection network for large-scale multiprocessor systems. The proposed node-ranking schemes outperform their hypercube counterparts in terms of communication cost. Finally, two algorithms for solving systems of linear equations are given. These algorithms are based on the proposed grid and pipeline schemes to carry out matrix triangulation and backward substitution, respectively.  相似文献   

相对于传统应用,大数据应用表现出并行性高、访存数据量大、访存模式不规则、程序访存时空局部性差等特性,对传统的计算机体系结构提出了新的挑战。Graph500是评测计算机系统大数据处理能力的基准测试排名,BFS算法是Graph500的核心程序,是典型的数据密集型应用。从1-D数据划分、优化的混合算法设计和远程通信方式设计三个方面开展研究,在课题组设计的大数据处理并行结构原型系统上设计实现了多节点的并行BFS算法,在222顶点、226边的数据规模下取得了803.8MTEPS的性能,并在此基础上进行多节点并行BFS算法的性能测试分析,为进一步的研究工作奠定了基础。  相似文献   

The paper presents novel embeddings of various classical topologies into the OPAM multicomputer. OPAM consists of a large number of processors that are connected by a two level, crossbar based interconnection network. The network combines a large, optical circuit-switched crossbar (reconfigurable network), with many small, packet-switching crossbars. The necessary embedding is very different than classical approaches. The goal in our case is to minimize routing decisions, so that communication requests can be satisfied by passing through two small crossbars. We show how to map parallel programs to this architecture using graph contraction notations. The family of parallel programs that we consider consists of multiple processes and communication links that are represented by connected, regular graphs such as rings, trees, two dimensional grids, cube connected cycles and hypercubes. In each case we show how to partition the vertex set of the program's graph to subsets, and how to assign each subset a cluster of processors in order to realize the topology of the given problem. In some of the cases we also prove that our partition and assignment algorithms are optimal  相似文献   

Highly regular multi-processor architectures are suitable for inherently highly parallelizable applications such as most of the image processing domain. Systems embedded in a single programmable chip platform (SoPC) allow hardware designers to tailor every aspect of the architecture in order to match the specific application needs. These platforms are now large enough to embed an increasing number of cores, allowing implementation of a multi-processor architecture with an embedded communication network. In this paper we present the parallelization and the embedding of a real time image stabilization algorithm on a SoPC platform. Our overall hardware implementation method is based upon meeting algorithm processing power requirements and communication needs with refinement of a generic parallel architecture model. Actual implementation is done by the choice and parameterization of readily available reconfigurable hardware modules and customizable commercially available IPs (Intellectual Property). We present both software and hardware implementation with performance results on a Xilinx SoPC target.  相似文献   

在大数据时代,图被用于各种领域表示具有复杂联系的数据.图计算应用被广泛用于各种领域,以挖掘图数据中潜在的价值.图计算应用特有的不规则执行行为,引发了不规则负载、密集读改写更新操作、不规则访存和不规则通信等挑战.现有通用架构无法有效地应对上述挑战.为了克服加速图计算应用面临的挑战,大量的图计算硬件加速架构设计被提出.它们...  相似文献   

We propose a new class of interconnection networks, called macro-star networks, which belong to the class of Cayley graphs and use the star graph as a basic building module. A macro-star network can have node degree that is considerably smaller than that of a star graph of the same size, and diameter that is sublogarithmic and asymptotically within a factor of 1.25 from a universal lower bound (given its node degree). We show that algorithms developed for star graphs can be emulated on suitably constructed macro-stars with asymptotically optimal slowdown. This enables us to obtain through emulation a variety of efficient algorithms for the macro-star network, thus proving its versatility. Basic communication tasks, such as the multimode broadcast and the total exchange, can be executed in macro-star networks in asymptotically optimal time under both the single-port and the all-port communication models. Moreover, no interconnection network with similar node degree can perform these communication tasks in time that is better by more than a constant factor than that required in a macro-star network. We show that macro-star networks can embed trees, meshes, hypercubes, as well as star, bubble-sort, and complete transposition graphs with constant dilation. We introduce several variants of the macro-star network that provide more flexibility in scaling up the number of nodes. We also discuss implementation issues and compare the new topology with the star graph and other popular topologies  相似文献   

An unweighted graph has density rho and growth rate k if the number of nodes in every ball with radius r is bounded by rhork. The communication graphs of wireless networks and peer-to-peer networks often have constant bounded density and small growth rate. In this paper, we study the trade-off between two quality measures for routing in growth-restricted graphs. The two measures we consider are the stretch factor, which measures the lengths of the routing paths, and the load-balancing ratio, which measures the evenness of the traffic distribution. We show that if the routing algorithm is required to use paths with stretch factor c, then its load-balancing ratio is bounded by O(rho1/k(n/c)1-1/k), and the bound is tight in the worst case. We show the application and extension of the trade-off to the wireless network routing and VLSI layout design. We also present a load-balanced routing algorithm with the stretch factor constraint in an online setting, in which the routing requests come one by one.  相似文献   

The cyclic antibandwidth problem is to embed the vertices of a graph G of n vertices on a cycle C n such that the minimum distance (measured in the cycle) of adjacent vertices is maximized. Exact results/conjectures for this problem exist in the literature for some standard graphs, such as paths, cycles, two-dimensional meshes, and tori, but no algorithm has been proposed for the general graphs in the literature reviewed by us so far. In this paper, we propose a memetic algorithm for the cyclic antibandwidth problem (MACAB) that can be applied on arbitrary graphs. An important feature of this algorithm is the use of breadth first search generated level structures of a graph to explore a variety of solutions. A novel greedy heuristic is designed which explores these level structures to label the vertices of the graph. The algorithm achieves the exact cyclic antibandwidth of all the standard graphs with known optimal values. Based on our experiments we conjecture the cyclic antibandwidth of three-dimensional meshes, hypercubes, and double stars. Experiments show that results obtained by MACAB are substantially better than those given by genetic algorithm.  相似文献   

并行LU分解的通信模式在WDM环网上的波长分配算法   总被引:2,自引:0,他引:2  
波长分配是光网络设计的基本问题,设计波长分配算法是洞察光网络通信能力的基本方法.不同的并行算法具有不同的通信模式,如何在光互连网上实现这些通信模式,是当前一个颇受关注的研究领域.本文基于WDM环网络,针对矩阵的并行LU分解,构造了一种并行LU分解的通信模式,讨论了将该通信模式嵌入在环形光网络中的波长分配问题.在解决该问题的过程中,得到了将一种特殊的二分图结构的通信模式嵌入在环网中的波长分配算法.通过分析和证明得到了在WDM环网上实现该并行LU分解通信模式所需的最小波长数.  相似文献   

During and immediately after their deployment, ad hoc and sensor networks lack an efficient communication scheme rendering even the most basic network coordination problems difficult. Before any reasonable communication can take place, nodes must come up with an initial structure that can serve as a foundation for more sophisticated algorithms. In this paper, we consider the problem of obtaining a vertex coloring as such an initial structure. We propose an algorithm that works in the unstructured radio network model. This model captures the characteristics of newly deployed ad hoc and sensor networks, i.e. asynchronous wake-up, no collision-detection, and scarce knowledge about the network topology. When modeling the network as a graph with bounded independence, our algorithm produces a correct coloring with O(Δ) colors in time O(Δ log n) with high probability, where n and Δ are the number of nodes in the network and the maximum degree, respectively. Also, the number of locally used colors depends only on the local node density. Graphs with bounded independence generalize unit disk graphs as well as many other well-known models for wireless multi-hop networks. They allow us to capture aspects such as obstacles, fading, or irregular signal-propagation. A preliminary version of this work has been published in [20] as Coloring Unstructured Radio Networks, In Proceedings of the 17th Symposium on Parallel Algorithms and Architectures (SPAA), Las Vegas, Nevada, 2005.  相似文献   

Consensus control of multi-agent systems is an innovative paradigm for the development of intelligent distributed systems. This has fascinated numerous scientific groups for their promising applications as they have the freedom to achieve their local and global goals and make their own decisions. Network communication topologies based on graph and matrix theory are widely used in a various real-time applications ranging from software agents to robotics. Therefore, while sustaining the significance of both directed and undirected graphs, this research emphases on the demonstration of a distributed average consensus algorithm. It uses the harmonic mean in the domain of multi-agent systems with directed and undirected graphs under static topologies based on a control input scheme. The proposed agreement protocol focuses on achieving a constant consensus on directional and undirected graphs using the exchange of information between neighbors to update their status values and to be able to calculate the total number of agents that contribute to the communication network at the same time. The proposed method is implemented for the identical networks that are considered under the directional and non-directional communication links. Two different scenarios are simulated and it is concluded that the undirected approach has an advantage over directed graph communication in terms of processing time and the total number of iterations required to achieve convergence. The same network parameters are introduced for both orientations of the communication graphs. In addition, the results of the simulation and the calculation of various matrices are provided at the end to validate the effectiveness of the proposed algorithm to achieve consensus.  相似文献   

Partitioning graphs into equally large groups of nodes while minimizing the number of edges between different groups is an extremely important problem in parallel computing. For instance, efficiently parallelizing several scientific and engineering applications requires the partitioning of data or tasks among processors such that the computational load on each node is roughly the same, while communication is minimized. Obtaining exact solutions is computationally intractable, since graph partitioning is NP-complete. For a large class of irregular and adaptive data parallel applications (such as adaptive graphs), the computational structure changes from one phase to another in an incremental fashion. In incremental graph-partitioning problems the partitioning of the graph needs to be updated as the graph changes over time; a small number of nodes or edges may be added or deleted at any given instant. In this paper, we use a linear programming-based method to solve the incremental graph-partitioning problem. All the steps used by our method are inherently parallel and hence our approach can be easily parallelized. By using an initial solution for the graph partitions derived from recursive spectral bisection-based methods, our methods can achieve repartitioning at considerably lower cost than can be obtained by applying recursive spectral bisection. Further, the quality of the partitioning achieved is comparable to that achieved by applying recursive spectral bisection to the incremental graphs from scratch  相似文献   

Large graphs are scale free and ubiquitous having irregular relationships. Clustering is used to find existent similar patterns in graphs and thus help in getting useful insights. In real-world, nodes may belong to more than one cluster thus, it is essential to analyze fuzzy cluster membership of nodes. Traditional centralized fuzzy clustering algorithms incur high communication cost and produce poor quality of clusters when used for large graphs. Thus, scalable solutions are obligatory to handle huge amount of data in less computational time with minimum disk access. In this paper, we proposed a parallel fuzzy clustering algorithm named ‘PGFC’ for handling scalable graph data. It will be advantageous from the viewpoint of expert systems to develop a clustering algorithm that can assure scalability along with better quality of clusters for handling large graphs.The algorithm is parallelized using bulk synchronous parallel (BSP) based Pregel model. The cluster centers are initialized using degree centrality measure, resulting in lesser number of iterations. The performance of PGFC is compared with other state of art clustering algorithms using synthetic graphs and real world networks. The experimental results reveal that the proposed PGFC scales up linearly to handle large graphs and produces better quality of clusters when compared to other graph clustering counterparts.  相似文献   

Graph computation problems that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures. Although recent studies use FPGA technology to tackle the memory wall problem of graph computation by adopting a massively multi-threaded architecture, the performance is still far less than optimal memory performance due to the long memory access latency. In this paper, we propose a comprehensive reconfigurable computing approach to address the memory wall problem. First, we present an extended edge-streaming model with massive partitions to provide better load balance while taking advantage of the streaming bandwidth of external memory in processing large graphs. Second, we propose a two-level shuffle network architecture to significantly reduce the on-chip memory requirement while provide high processing throughput that matches the bandwidth of the external memory. Third, we introduce a compact storage design based on graph compression schemes and propose the corresponding encoding and decoding hardware to reduce the data volume transferred between the processing engines and external memory. We validate the effectiveness of the proposed architecture by implementing three frequently-used graph algorithms on ML605 board, showing an up to 3.85 × improvement in terms of performance to bandwidth ratio over previously published FPGA-based implementations.  相似文献   

We present a novel algorithm to solve the non-negative single-source shortest path problem on road networks and graphs with low highway dimension. After a quick preprocessing phase, we can compute all distances from a given source in the graph with essentially a linear sweep over all vertices. Because this sweep is independent of the source, we are able to reorder vertices in advance to exploit locality. Moreover, our algorithm takes advantage of features of modern CPU architectures, such as SSE and multiple cores. Compared to Dijkstra’s algorithm, our method needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels. We gain additional speedup when implementing our algorithm on a GPU, where it is up to three orders of magnitude faster than Dijkstra’s algorithm on a high-end CPU. This makes applications based on all-pairs shortest-paths practical for continental-sized road networks. Several algorithms, such as computing the graph diameter, arc flags, or exact reaches, can be greatly accelerated by our method.  相似文献   

This paper introduces a novel interconnection network called KMcube (Kautz-Möbius cube). KMcube is a compound graph of a Kautz digraph and Möbius cubes. That is, it uses the Möbius cubes as the unit cluster and connects many such clusters by means of a Kautz digraph at the cost of only one additional arc being added to any node in each Möbius cubes. The topological benefits of both basic graphs are preserved in the compound network. It utilizes the topological properties of Möbius cubes to conveniently embed parallel algorithms into each cluster and the short diameter of a Kautz digraph to support efficient inter-cluster communication. Additionally, KMcube provides other attractive properties, such as the regularity, symmetry, and expandability. The proposed methodology for KMcube is further applied to the compound graphs of Kautz digraph and other Möbius-like graphs with the similar diameter to a Möbius cube. Moreover, other hybrid graphs of Kautz digraph and Möbius cubes are proposed and compared.  相似文献   

We present the first polynomial-time approximation scheme (PTAS) for the Minimum Independent Dominating Set problem in graphs of polynomially bounded growth which are used to model wireless communication networks.The approach presented yields a robust algorithm, that is, it accepts any undirected graph as input, and returns a (1+ε)-approximate minimum independent dominating set, or a certificate showing that the input graph does not satisfy the bounded growth property.  相似文献   

Partitioning a data set of attributed graphs into clusters arises in different application areas of structural pattern recognition and computer vision. Despite its importance, graph clustering is currently an underdeveloped research area in machine learning due to the lack of theoretical analysis and the high computational cost of measuring structural proximities. To address the first issue, we introduce the concept of metric graph spaces that enables central (or center-based) clustering algorithms to be applied to the domain of attributed graphs. The key idea is to embed attributed graphs into Euclidean space without loss of structural information. In addressing the second issue of computational complexity, we propose a neural network solution of the K-means algorithm for structures (KMS). As a distinguishing feature to improve the computational time, the proposed algorithm classifies the data graphs according to the principle of elimination of competition where the input graph is assigned to the winning model of the competition. In experiments we investigate the behavior and performance of the neural KMS algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号