期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Conflict-free star-access in parallel memory systems

《Journal of Parallel and Distributed Computing》2006,66(11):1431-1441

We study conflict-free data distribution schemes in parallel memories in multiprocessor system architectures. Given a host graph G, the problem is to map the nodes of G into memory modules such that any instance of a template type T in G can be accessed without memory conflicts. A conflict occurs if two or more nodes of T are mapped to the same memory module. The mapping algorithm should: (i) be fast in terms of data access (possibly mapping each node in constant time); (ii) minimize the required number of memory modules for accessing any instance in G of the given template type; and (iii) guarantee load balancing on the modules. In this paper, we consider conflict-free access to star templates, i.e., to any node of G along with all of its neighbors. Such a template type arises in many classical algorithms like breadth-first search in a graph, message broadcasting in networks, and nearest neighbor based approximation in numerical computation. We consider the star-template access problem on two specific host graphs—tori and hypercubes—that are also popular interconnection network topologies. The proposed conflict-free mappings on these graphs are fast, use an optimal or provably good number of memory modules, and guarantee load balancing. 相似文献

2.

Macro-star networks: efficient low-degree alternatives to stargraphs

Chi-Hsiang Yeh Varvarigos E.A. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(10):987-1003

We propose a new class of interconnection networks, called macro-star networks, which belong to the class of Cayley graphs and use the star graph as a basic building module. A macro-star network can have node degree that is considerably smaller than that of a star graph of the same size, and diameter that is sublogarithmic and asymptotically within a factor of 1.25 from a universal lower bound (given its node degree). We show that algorithms developed for star graphs can be emulated on suitably constructed macro-stars with asymptotically optimal slowdown. This enables us to obtain through emulation a variety of efficient algorithms for the macro-star network, thus proving its versatility. Basic communication tasks, such as the multimode broadcast and the total exchange, can be executed in macro-star networks in asymptotically optimal time under both the single-port and the all-port communication models. Moreover, no interconnection network with similar node degree can perform these communication tasks in time that is better by more than a constant factor than that required in a macro-star network. We show that macro-star networks can embed trees, meshes, hypercubes, as well as star, bubble-sort, and complete transposition graphs with constant dilation. We introduce several variants of the macro-star network that provide more flexibility in scaling up the number of nodes. We also discuss implementation issues and compare the new topology with the star graph and other popular topologies 相似文献

3.

Fault-tolerant de Bruijn and shuffle-exchange networks

Bruck J. Cypher R. Ching-Tien Ho 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(5):548-553

This paper addresses the problem of creating a fault-tolerant interconnection network for a parallel computer. Three topologies, namely, the base-2 de Bruijn graph, the base-m de Bruijn graph, and the shuffle-exchange, are studied. For each topology an N+k node fault-tolerant graph is defined. These fault-tolerant graphs have the property that given any set of k node faults, the remaining N nodes contain the desired topology as a subgraph. All of the constructions given are the best known in terms of the degree of the fault-tolerant graph. We also investigate the use of buses to reduce the degrees of the fault-tolerant graphs still further 相似文献

4.

Joint application mapping/interconnect synthesis techniques for embedded chip-scale multiprocessors 总被引：1，自引：0，他引：1

Bambha N.K. Bhattacharyya S.S. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(2):99-112

As transistor sizes shrink, interconnects represent an increasing bottleneck for chip designers. Several groups are developing new interconnection methods and system architectures to cope with this trend. New architectures require new methods for high-level application mapping and hardware/software codesign. We present high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessor systems-on-chip that are streamlined for one or more digital signal processing applications. That is, we seek to synthesize an application-specific interconnect topology. We show that flexible interconnect topologies utilizing low-hop communication between processors offer advantages for reduced power and latency. We show that existing multiprocessor scheduling algorithms can deadlock if the topology graph is not strongly connected, or if a constraint is imposed on the maximum number of hops allowed for communication. We detail an efficient algorithm that can be used in conjunction with existing scheduling algorithms for avoiding this deadlock. We show that it is advantageous to perform application scheduling and interconnect synthesis jointly, and present a probabilistic scheduling/interconnect algorithm that utilizes graph isomorphism to pare the design space. 相似文献

5.

A Compiler-Directed Approach to Network Latency Reduction for Distributed Shared Memory Multiprocessors

Sibabrata Ray Hong Jiang Qing Yang 《Journal of Parallel and Distributed Computing》1996,38(2):267

In distributed shared memory multiprocessor systems, parallel tasks communicate through sharing memory data. As the system size increases, such communication cost becomes the main factor that limits the overall parallelism and performance. In this paper, we propose a new solution to the problem through judiciously managing the relevant resource, namely, the shared data and the interconnection network (IN) through which the sharing is carried out. In this approach, communication cost is minimized by means of data migration/allocation which is based on analyzing general layered task graphs, sharing behavior of parallel tasks, and network topology. Our method is not applicable for read only variables. Further, for the time being, the usefulness of the method is limited to multiprocessors where no cache coherence mechanism is implemented. Four typical interconnection topologies for multiprocessors are considered, namely, shared-bus, hierarchical-bus, 2-D mesh, and fat-tree structures. Efficient data allocation algorithms for each of the four network topologies are developed that make decision on data allocation/migration at the compile time. The complexity of one algorithm isO(np) for shared-bus andO(n²p) for the remaining three in a system withnprocessors executing ap-layer task graph for one shared variable. We have also given an algorithm to determine optimal allocation/migration scheme for multiple shared variables. However, the cost of the algorithm become prohibitive when the number of shared variables is high. Therefore, a heuristic of low complexity is suggested. The heuristic is optimal for some topologies. 相似文献

6.

E-kernel: an embedding kernel on the IBM victor V256 multiprocessorfor program mapping and network reconfiguration

Ma E. Shea D.G. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):977-994

We present the design of E-kernel, an embedding kernel on the Victor V256 message-passing partitionable multiprocessor, developed for the support of program mapping and network reconfiguration. E-kernel supports the embedding of a new network topology onto Victor's 2D mesh and also the embedding of a task graph onto the 2D mesh network or the reconfigured network. In the current implementation, the reconfigured network can be a line or an even-size ring, and the task graphs meshes or tori of a variety of dimensions and shapes or graphs with similar topologies. For application programs having these task graph topologies and that are designed according to the communication model of E-kernel, they can be run without any change on partitions connected by the 2D mesh, line, or ring. Further, E-kernel attempts the communication optimization of these programs on the different networks automatically, thus making both the network topology and the communication optimization attempt completely transparent to the application programs. Many of the embeddings used in E-kernel are optimal or asymptotically optimal (with respect to minimum dilation cost). The implementation of E-kernel translated some of the many theoretical results in graph embeddings into practical tools for program mapping and network reconfiguration in a parallel system. E-kernel is functional on Victor V256. Measurements of E-kernel's performance on V256 are also included 相似文献

7.

A comparative study of job allocation and migration in the pancake network

Robert Bennes Naoto Kimura 《Information Sciences》2007,177(11):2327-2335

Pancake networks are an attractive class of Cayley graphs functioning as a viable interconnection scheme for large multi-processor systems. The hierarchy of the pancake graph allows the assignment of its special subgraphs, which have the same topological features as the original graph, to a sequence of incoming jobs. We investigate the hierarchical structure of the pancake network and derive a job allocation scheme for assigning processors to incoming jobs. An algorithm is presented for job migration. Finally, we compare the assignment scheme to those derived previously for the star network and address the shortcomings of the pancake network. 相似文献

8.

关于互连网络的几个猜想 总被引：2，自引：0，他引：2

下载免费PDF全文

师海忠路建波《计算机工程与应用》2008,44(31):112-115

n-立方体是著名的互连网络,星图、煎饼图和冒泡排序图是由凯莱图模型设计出来的重要的互连网络。对换树（transposition tree）的凯莱图是一类特殊的凯莱图,星图和冒泡排序图分别是对换树为星和路的凯莱图。给出了关于n-立方体、星图、煎饼图、冒泡排序图和对换树的凯莱图的各一个猜想;提出了对换图的凯莱图的概念,进而由这一概念设计出了两个互连网络——圈图和轮图,并证明冒泡排序图和星图分别可嵌入圈图和轮图。相似文献

9.

Embedding classical communication topologies in the scalable OPAMarchitecture

Barak A. Schenfeld E. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(9):979-992

The paper presents novel embeddings of various classical topologies into the OPAM multicomputer. OPAM consists of a large number of processors that are connected by a two level, crossbar based interconnection network. The network combines a large, optical circuit-switched crossbar (reconfigurable network), with many small, packet-switching crossbars. The necessary embedding is very different than classical approaches. The goal in our case is to minimize routing decisions, so that communication requests can be satisfied by passing through two small crossbars. We show how to map parallel programs to this architecture using graph contraction notations. The family of parallel programs that we consider consists of multiple processes and communication links that are represented by connected, regular graphs such as rings, trees, two dimensional grids, cube connected cycles and hypercubes. In each case we show how to partition the vertex set of the program's graph to subsets, and how to assign each subset a cluster of processors in order to realize the topology of the given problem. In some of the cases we also prove that our partition and assignment algorithms are optimal 相似文献

10.

Performance metrics in the average consensus problem: A tutorial

E. Lovisari S. Zampieri 《Annual Reviews in Control》2012,36(1):26-41

The average consensus algorithm is a distributed procedure which allows a network of agents to agree on the average of a set of initial values. The computation occurs through local exchange of information only, namely the information exchange takes place only between agents which are neighbors with respect to a graph representing the system communication architecture. Several performance metrics have been proposed for the evaluation of this algorithm. Particularly interesting and challenging is to relate them to the communication topology. Different performance metrics may yield different answers in comparing alternative communication topologies. In this paper, we present a few performance metrics and we show how these metrics are related to the communication topology. In particular, when available, we present bounds which permit to relate performance and topology for general graphs, for graphs with symmetries, called d-dimensional tori, and for geometric graphs. 相似文献

11.

Executing algorithms with hypercube topology on torusmulticomputers

Gonzalez A. Valero-Garcia M. Diaz de Cerio L. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(8):803-814

Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputers is constrained by the fact that the interconnection cost-per-node increases with the total number of nodes. From scalability point of view, meshes and toruses are more interesting classes of interconnection topologies. This paper focuses on the execution of algorithms with hypercube communication topology on multicomputers with mesh or torus interconnection topologies. The proposed approach is based on looking at different embeddings of hypercube graphs onto mesh or torus graphs. The paper concentrates on toruses since an already known embedding, which is called standard embedding, is optimal for meshes. In this paper, an embedding of hypercubes onto toruses of any given dimension is proposed. This novel embedding is called xor embedding. The paper presents a set of performance figures for both the standard and the xor embeddings and shows that the latter outperforms the former for any torus. In addition, it is proven that for a one-dimensional torus (a ring) the xor embedding is optimal in the sense that it minimizes the execution time of a class of parallel algorithms with hypercube topology. This class of algorithms is frequently found in real applications, such as FFT and some class of sorting algorithms 相似文献

12.

Partitioning and mapping of parallel programs by self-organization

Hans-Ulrich Heiss Marcus Dormanns 《Concurrency and Computation》1996,8(9):685-706

To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped to the particular processors of the parallel machine. The aim of the mapping is twofold: (i) to achieve a balanced load on the processors (partitioning problem) and (ii) to keep communication delays low by placing communicating tasks closely together (mapping). Since both the communication structure of the program and the interconnection structure of the parallel machine can be represented as graphs, the mapping problem can be regarded as a graph embedding problem to minimize communication costs. As a new heuristic approach to this NP-hard problem we apply Kohonen's self-organizing maps to establish a topology-preserving embedding. Experimental results are presented and compared to other approaches to this problem. The most attractive feature of our new method is that it can be extremely well parallelized. 相似文献

13.

Properties of a hierarchical network based on the star graph

Navid Imani 《Information Sciences》2010,180(14):2802-2813

This paper introduces a new class of interconnection networks named star-pyramid. An n-level star-pyramid is formed by piling up star graphs of dimensions 1 to n in a hierarchy, connecting any node in each i-dimensional star, 1 < i ? n, to a node in the (i − 1)-dimensional star whose index is reached by removing the i symbol from the index of the former node in the i-dimensional star graph. Having extracted the properties of the new topology, featuring topological properties, a minimal routing algorithm, a simple but efficient broadcast algorithm, Hamiltonicity and pancyclicity, we then compare the network properties of the proposed topology and the well-known pyramid topology. We show that the star-pyramid has some more attractive properties than its equivalent pyramid. Finally, we propose two variants of star-pyramid, namely the generic star-pyramid and wrapped star-pyramid, as topologies with improved scalability, fault-tolerance, and diameter. 相似文献

14.

The on-line first-fit algorithm for radio frequency assignment problems

Yin-Te Tsai Yaw-Ling Lin 《Information Processing Letters》2002,84(4):195-199

The radio frequency assignment problem is to minimize the number of frequencies used by transmitters with no interference in radio communication networks; it can be modeled as the minimum vertex coloring problem on unit disk graphs. In this paper, we consider the on-line first-fit algorithm for the problem and show that the competitive ratio of the algorithm for the unit disk graph G with χ(G)=2 is 3, where χ(G) is the chromatic number of G. Moreover, the competitive ratio of the algorithm for the unit disk graph G with χ(G)>2 is at least 4−3/χ(G). The average performance for the algorithm is also discussed in this paper. 相似文献

15.

Multi‐layer distributed protocols for robust cooperative tracking in interconnected nonlinear multiagent systems

Vahid Rezaei Margareta Stefanovic 《国际强度与非线性控制杂志
》2019,29(12):3859-3891

We develop a mixed graph and optimal control theoretic formulation to design a robust cooperative control protocol for a large‐scale multiagent system with partially known interconnected first‐, second‐, or mixed first‐ and second‐order dynamics. In each case, we transform the control protocol design task to a robust communication graph design problem, which, from a cyber‐physical perspective, is interpreted as the control layer design problem for an interconnected system with unknown agent layer dynamics. According to this viewpoint, each state variable has its own control layer communication topology separate from the other state variable's communication topology and the unknown agent layer interconnection topologies. We prove that all cooperative, decentralized, and centralized tracking protocols can be treated as a single design problem and, by deriving closed‐form solutions for the robust control layer topologies, we further provide a simpler design procedure, which is only based on the matrix manipulations. Aside from the linear implementation of the protocol and the connection of the proposed formulation to the well known rules‐of‐thumb in optimal control theory, this creates a higher potential to transfer ideas to industry. Modeling uncertainties tolerable by a given control layer topology is analyzed, and a preliminary performance‐oriented analysis and design approach for large‐scale interconnected systems is discussed. We show that exactly the same steps can be followed to design appropriate control layers for both tracking and stabilization. 相似文献

16.

Mapping Parallel Application Communication Topology to Rhombic Overlapping-Cluster Multiprocessors

Hoganson Kenneth 《The Journal of supercomputing》2000,17(1):67-90

This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation. 相似文献

17.

Global target aggregation and state agreement of nonlinear multi-agent systems with switching topologies 总被引：1，自引：0，他引：1

Guodong Shi Author Vitae Yiguang Hong Author Vitae 《Automatica》2009,45(5):1165-1175

In this paper, we discuss coordination problems of a group of autonomous agents, including the target aggregation to a convex set and the state agreement. The aggregation of the whole agent group, consisting of leaders (informed agents) and followers, to a given set is investigated with switching interconnection topologies described by the connectivity assumptions on the joint topology in the time interval [t,+∞) for any time t, and then the state agreement problem is studied in a similar way. An approach based on set stability and limit set analysis is given to study the multi-agent convergence problems. With the help of graph theory and convex analysis, coordination conditions are obtained in some important cases, and the results show that simple local rules can make the networked agents with first-order nonlinear individual dynamics achieve desired collective behaviors. 相似文献

18.

Cyclic-cubes: a new family of interconnection networks of evenfixed-degrees

Ada Wai-Chee Fu Siu-Cheung Chau 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(12):1253-1268

We introduce a new family of interconnection networks that are Cayley graphs with fixed degrees of any even number greater than or equal to four. We call the proposed graphs cyclic-cubes because contracting some cycles in such a graph results in a generalized hypercube. These Cayley graphs have optimal fault tolerance and logarithmic diameters. For comparable number of nodes, a cyclic-cube can have a diameter smaller than previously known fixed-degree networks. The proposed graphs can adopt an optimum routing algorithm known for one of its subfamilies of Cayley graphs. We also show that a graph in the new family has a Hamiltonian cycle and, hence, there is an embedding of a ring. Embedding of meshes and hypercubes are also discussed 相似文献

19.

The Hyperstar Interconnection Network

Abdel-Elah Al-Ayyoub Khaled Day 《Journal of Parallel and Distributed Computing》1998,48(2):59

In this paper, a multiprocessor interconnection topology, the hyperstar, based on the Cartesian product of star graphs is studied. The basic properties of the hyperstar are discussed and proved. This includes reduced degree and diameter, hierarchical structure, vertex symmetry, optimal routing, and shortest path characterization. The hyperstar is shown to be a member of the Cayley class of symmetric graphs. Embeddings of hypercubes, star graphs, and meshes are discussed. An optimal one-to-all broadcasting algorithm is obtained and analyzed. Some results on fault tolerance, parallel paths, Hamiltonian cycles, and VLSI layouts are obtained. Furthermore, a comparative study between the hyperstar and seven related networks is conducted. The comparison is based on scalability, broadcasting cost, link requirements, cost/performance ratio, and other static parameters such as degree, diameter, and average diameter. 相似文献

20.

Optimal strategies in the average consensus problem

Jean-Charles Delvenne Ruggero Carli Sandro Zampieri 《Systems & Control Letters》2009,58(10-11):759-765

Let a set of communicating agents compute the average of their initial positions, where every agent is restricted to communicate to a given small number of other agents (average consensus problem). We prove that the optimal topology of communication is given by a de Bruijn graph. Consensus is then reached in finitely many steps. This solution is valid when the number of agents is an exact power of the out-degree of the communication graph. We introduce an algebraic tool, the shifted Kronecker product, and a more general family of strategies, also based on a de Bruijn communication graph. Those strategies are compared to Cayley strategies in terms of the speed of convergence. We also show that quantized communication between the agents still allows finite convergence, to a consensus, which is not in general the average of the initial positions. 相似文献