首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputers is constrained by the fact that the interconnection cost-per-node increases with the total number of nodes. From scalability point of view, meshes and toruses are more interesting classes of interconnection topologies. This paper focuses on the execution of algorithms with hypercube communication topology on multicomputers with mesh or torus interconnection topologies. The proposed approach is based on looking at different embeddings of hypercube graphs onto mesh or torus graphs. The paper concentrates on toruses since an already known embedding, which is called standard embedding, is optimal for meshes. In this paper, an embedding of hypercubes onto toruses of any given dimension is proposed. This novel embedding is called xor embedding. The paper presents a set of performance figures for both the standard and the xor embeddings and shows that the latter outperforms the former for any torus. In addition, it is proven that for a one-dimensional torus (a ring) the xor embedding is optimal in the sense that it minimizes the execution time of a class of parallel algorithms with hypercube topology. This class of algorithms is frequently found in real applications, such as FFT and some class of sorting algorithms  相似文献   

2.
The paper presents novel embeddings of various classical topologies into the OPAM multicomputer. OPAM consists of a large number of processors that are connected by a two level, crossbar based interconnection network. The network combines a large, optical circuit-switched crossbar (reconfigurable network), with many small, packet-switching crossbars. The necessary embedding is very different than classical approaches. The goal in our case is to minimize routing decisions, so that communication requests can be satisfied by passing through two small crossbars. We show how to map parallel programs to this architecture using graph contraction notations. The family of parallel programs that we consider consists of multiple processes and communication links that are represented by connected, regular graphs such as rings, trees, two dimensional grids, cube connected cycles and hypercubes. In each case we show how to partition the vertex set of the program's graph to subsets, and how to assign each subset a cluster of processors in order to realize the topology of the given problem. In some of the cases we also prove that our partition and assignment algorithms are optimal  相似文献   

3.
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.  相似文献   

4.
Algorithms for many geometric and physical algorithms rely on a decomposition of 3-D space. Typically, a cubical decomposition is used, where each cubical cell is adjacent to, and may interact with, as many as 26 neighboring cells. In this paper, we explore an alternate structure, the woven mesh, that arises from a decomposition based on truncated octahedra. An algorithm is given to map 3-D space into the cells of the woven mesh and we give mappings which exploit the lower degree of the woven mesh, 14 rather than 26, to obtain better embeddings of 3-D space onto low degree processor arrays. We show that the woven mesh can be embedded with dilation 2 and congestion 4 onto a 3-D mesh. A dilation 3, congestion 8 embedding onto a 4-regular graph is also presented. These embeddings achieve optimal dilation on their respective host graphs; we prove that there does not exist a dilation 2 embedding of the woven mesh on any 4-regular graph. Of particular interest is the proof technique which was guided by deductive computation.  相似文献   

5.
《Parallel Computing》1997,23(10):1429-1460
Based on the Cayley graph framework for the generation and evaluation of multiprocessor interconnection topologies, an application dependent task mapping problem is addressed. We start with interconnection topologies considered attractive from a graph theoretical point of view. Given such a topology together with an application dependent task communication profile, the problem addressed in this paper is to find an optimal task-to-processor assignment. Such a mapping yields for the given interconnection topology and the given profile a minimal expected communication path length and therefore a minimal number of data transfer steps between the physical processing elements of a multiprocessor machine. At first, the Cayley graph approach is briefly outlined. We demonstrate the potential of Cayley graphs for a systematic generation framework aiming at node-symmetric interconnection topologies for a given number of processing elements each equipped with a constant number of communication channels. Cayley graphs found most attractive within a large set of generated graphs are compared to prominent interconnection topologies like, for example, hypercubes. Later on, it is shown that the problem of the optimal task-to-processor assignment, being a special case of the well-known quadratic assignment problem, is still NP-hard. Consequently, any practically relevant mapping algorithm can be expected to produce at best near-optimal solutions for reasonable problem sizes beyond approximately 10 nodes. We present two fundamentally different mapping approaches, namely a straightforward greedy mapping and a more sophisticated algorithm using simulated annealing techniques as known from artificial intelligence applications. For both approaches, we elaborate on their relative performance as well as, where feasible, on the question of suboptimality.  相似文献   

6.
Network-on-Chip (NoC) has been proposed to replace traditional bus based System-on-Chip (SoC) architecture to address the global communication challenges in nanoscale technologies. A major challenge in NoC based system design is to select Intellectual Property (IP) cores for implementing tasks and associate the selected cores to the routers to optimize cost and performance. These are commonly known as the process of core selection and application mapping respectively. In this paper, integrated core selection and mapping problem has been addressed. Mesh architecture has been considered for experimentation. The integrated core selection and mapping problem takes as input the application task graph, topology graph and a core library. It outputs the selected cores for the tasks and their mapping onto the topology graph, such that, all communication requirements of the application are satisfied. The cores present in a core library may perform more than one task and have non-uniform sizes. For this, a technique based on Particle Swarm Optimization (PSO) has been proposed to select cores from the given core library and map the resultant core graph onto mesh based architectures. An efficient heuristic for mapping has also been proposed, which maps the selected cores onto mesh based architectures, considering non-uniform core sizes. Comparisons have been carried out with step-by-step core selection and mapping approach and also with mapping algorithms that exist in the literature. Significant reductions have been observed in terms of communication cost over all the cases. Area comparisons have also been made. On average, improvement of 13.05% in communication cost and 2.07% in area have been observed. The proposed approach has also been compared in dynamic environment and significant reductions in the average network latency could be observed. On average, improvement of 5.48% in average network latency and 15.68% in network throughput has been observed. Comparison of energy consumption has also been done in both the cases.  相似文献   

7.
We present new techniques for mapping computations onto hypercubes. Our methods speed up classical implementations of grid and tree communications by a factor of (n), wheren is the number of hypercube dimensions. The speedups are asymptotically the best possible.We obtain these speedups by mapping each edge of the guest graph onto short, edge-disjoint paths in the hypercube such that the maximum congestion on any hypercube edge isO(1). These multiple-path embeddings can be used to reduce communication time for large grid-based scientific computations, to increase tolerance to link faults, and for fast routing of large messages.We also develop a general technique for deriving multiple-path embeddings from multiple-copy embeddings. Multiple-copy embeddings are one-to-one maps of independent copies of the guest graph within the hypercube. We present an efficient multiple-copy embedding of the cube-connected-cycles network within the hypercube. This embedding is used to derive efficient multiple-path embeddings of trees and butterfly networks in hypercubes.This research was supported by NSF/DARRA Grant CCR-8908285, NSF Grant CCR-8807426, and AFOSR Grant 89-0382.  相似文献   

8.
Network virtualization aims to provide a way to overcome ossification of the Internet. However, making efficient use of substrate resources requires effective techniques for embedding virtual networks: mapping virtual nodes and virtual edges onto substrate networks. Previous research has presented several heuristic algorithms, which fail to consider that the attributes of the substrate topology and virtual networks affect the embedding process. In this paper, for the first time, we introduce complex network centrality analysis into the virtual network embedding, and propose virtual network embedding algorithms based on closeness centrality. Due to considering of the attributes of nodes and edges in the topology, our studies are more reasonable than existing work. In addition, with the guidance of topology quantitative evaluation, the proposed network embedding approach largely improves the network utilization efficiency and decreases the embedding complexity. We also investigate our algorithms on real network topologies (e.g., AT&T, DFN) and random network topologies. Experimental results demonstrate the usability and capability of the proposed approach.  相似文献   

9.
Network-on-Chip (NoC) is widely used as a communication scheme in modern many-core systems. To guarantee the reliability of communication, effective fault tolerant techniques are critical for an NoC. In this paper, a novel fault tolerant architecture employing redundant routers is proposed to maintain the functionality of a network in the presence of failures. This architecture consists of a mesh of 2 × 2 router blocks with a spare router placed in the center of each block. This spare router provides a viable alternative when a router fails in a block. The proposed fault-tolerant architecture is therefore referred to as a quad-spare mesh. The quad-spare mesh can be dynamically reconfigured by changing control signals without altering the underlying topology. This dynamic reconfiguration and its corresponding routing algorithm are demonstrated in detail. Since the topology after reconfiguration is consistent with the original error-free 2D mesh, the proposed design is transparent to operating systems and application software. Experimental results show that the proposed design achieves significant improvements on reliability compared with those reported in the literature. Comparing the error-free system with a single router failure case, the throughput only decreases by 5.19% and latency increases by 2.40%, with about 45.9% hardware redundancy.  相似文献   

10.
Network-on-chip-based communication schemes represent a promising solution to the increasing complexity of system-on-chip problems. In this paper, we propose a new mesh-like topology called the shortly connected mesh technology (ScMesh), which is based on the traditional mesh topology, to exploit the graph symmetry properties of interconnection networks. This proposed topology not only enhances network performance by reducing the network diameter, but also provides a lower area/energy solution for interconnection network scenarios. This study analyzes and compares the performance of ScMesh to some newly improved topologies, including the WK-recursive, extended-butterfly fat tree, and diametrical mesh topologies. The experiment results indicate that ScMesh outperforms the other topologies, with throughput increases of 47.71, 33.45, and 18.64 % as well as latency decreases of 45.71, 35.84, and 14.58 % compared to the extended-butterfly fat tree, WK-recursive and diametrical mesh topologies, respectively. In addition, ScMesh achieves 41.22, 32.23, and 15.01 % lower energy consumption and 38.96, 27.43, and 18.21 % lower area overhead than the extended-butterfly fat tree, WK-recursive, and diametrical mesh topologies, respectively.  相似文献   

11.
A problem of mapping graphs of parallel programs onto graphs of distributed computer systems by recurrent neural network is formulated. Parameter values providing absence of incorrect solutions are experimentally determined. Because of introduction of penalty coefficient into Lyapunov function for the program graph edges non-coincided with the system graph edges, optimal solutions are found for mapping a “line”-graph onto a two-dimensional torus. For increasing probability of finding optimal mapping, a method for splitting the mapping is proposed. The method essence is a reducing solution matrix to a block-diagonal form. The Wang recurrent neural network is used to exclude incorrect solutions of the problem of mapping the line-graph onto a three-dimensional torus. This network converges quicker than the Hopfield one.  相似文献   

12.
The average consensus algorithm is a distributed procedure which allows a network of agents to agree on the average of a set of initial values. The computation occurs through local exchange of information only, namely the information exchange takes place only between agents which are neighbors with respect to a graph representing the system communication architecture. Several performance metrics have been proposed for the evaluation of this algorithm. Particularly interesting and challenging is to relate them to the communication topology. Different performance metrics may yield different answers in comparing alternative communication topologies. In this paper, we present a few performance metrics and we show how these metrics are related to the communication topology. In particular, when available, we present bounds which permit to relate performance and topology for general graphs, for graphs with symmetries, called d-dimensional tori, and for geometric graphs.  相似文献   

13.
Among Cayley graphs on the symmetric group, some that have a noticeably low diameter relative to the degree of regularity are examples such as the ``star' network and the ``pancake' network, the latter a representative of a variety of Cayley graphs generated by reversals. These diameter and degree conditions make these graphs potential candidates for parallel computation networks. Thus it is natural to investigate how well they can simulate other standard parallel networks, in particular hypercubes. For this purpose, constructions have previously been given for low dilation embeddings of hypercubes into star networks. Developing this theme further, in this paper we construct especially low dilation maps (e.g., with dilation 1, 2, 3, or 4) of hypercubes into pancake networks and related Cayley graphs generated by reversals. Whereas obtaining such results by the use of ``traditional' graph embeddings (i.e., one-to-one or many-to-one embeddings) is sometimes difficult or impossible, we achieve many of these results by using a nontraditional simulation technique known as a ``one-to-many' graph embedding. That is, in such embeddings we allow each vertex in the guest (i.e., domain) graph to be associated with some nonempty subset of the vertex set of the host (i.e., range) graph, these subsets satisfying certain distance and connection requirements which make the simulation possible. Received July 30, 1997, and in revised form July 18, 2001. Online publication October 30, 2001.  相似文献   

14.
This paper proposes two-dimensional directed graphs (or digraphs for short) as a promising alternative to the popular 2D mesh topology for networks-on-chip (NoCs). Mesh is the most popular topology for the NoCs, mainly due to its suitability for on-chip implementation and low cost. However, the fact that a digraph offers a lower diameter than its equivalent linear array of equal cost motivated us to evaluate digraphs as the underlying topology of NoCs. This paper introduces a family of NoC topologies based on three well-known digraphs, namely de Bruijn, shuffle-exchange, and Kautz. We study topological properties of the proposed topologies. We show that the proposed digraph-based topologies have several attractive features including constant node degree, low diameter and cost, and low zero load latency which result in superior performance over the mesh. We introduce a deadlock-free routing algorithm for the proposed NoC topologies and compare NoCs employing the proposed topologies and the mesh topology in terms of power consumption and performance. Simulation results also reveal that the proposed NoC topologies offer higher performance and consume lower power than the mesh NoC.  相似文献   

15.
In distributed shared memory multiprocessor systems, parallel tasks communicate through sharing memory data. As the system size increases, such communication cost becomes the main factor that limits the overall parallelism and performance. In this paper, we propose a new solution to the problem through judiciously managing the relevant resource, namely, the shared data and the interconnection network (IN) through which the sharing is carried out. In this approach, communication cost is minimized by means of data migration/allocation which is based on analyzing general layered task graphs, sharing behavior of parallel tasks, and network topology. Our method is not applicable for read only variables. Further, for the time being, the usefulness of the method is limited to multiprocessors where no cache coherence mechanism is implemented. Four typical interconnection topologies for multiprocessors are considered, namely, shared-bus, hierarchical-bus, 2-D mesh, and fat-tree structures. Efficient data allocation algorithms for each of the four network topologies are developed that make decision on data allocation/migration at the compile time. The complexity of one algorithm isO(np) for shared-bus andO(n2p) for the remaining three in a system withnprocessors executing ap-layer task graph for one shared variable. We have also given an algorithm to determine optimal allocation/migration scheme for multiple shared variables. However, the cost of the algorithm become prohibitive when the number of shared variables is high. Therefore, a heuristic of low complexity is suggested. The heuristic is optimal for some topologies.  相似文献   

16.
In this paper we present schemes for reconfiguration of embedded task graphs in hypercubes. Previous results, which use either fault-tolerant embedding or an automorphism approach, can be expensive in terms of either the required number of spare nodes or reconfiguration time. Using the free dimension concept, we combine the above two approaches in our schemes which can tolerate about n faulty nodes under the worst case while keeping task migration time small. With expansion-2 initial embedding, three distributed reconfiguration schemes are presented in this paper. The first scheme, applied to chains and rings, can tolerate any ƒ ≤ n − 2 faulty nodes in an n-dimensional hypercube. The second and third schemes are applied to meshes or tori. For a mesh or torus of size 2m1 1 ··· 1 2md, the second scheme can tolerate any ƒ ≤ mi − 1 faulty nodes, where mi is the largest direction of the mesh and n = m1 + ··· + md + 1. By embedding two copies of meshes or tori in cube, the third scheme can tolerate any ƒ ≤ n − 1 faulty nodes with the dilation of embedding after reconfiguration degraded to 2. The third scheme is quite general and can be applied to any task graph.  相似文献   

17.
A novel reconfigurable architecture based on a multiring multiprocessor network is described. The reconfigurability of the architecture is shown to result in a low network diameter and also a low degree of connectivity for each node in the network. The mathematical properties of the network topology and the hardware for the reconfiguration switch are described. Primitive parallel operations on the network topology are described and analyzed. The architecture is shown to contain 2D mesh topologies of varying sizes and also a single one factor of the Boolean hypercube in any given configuration. A large class of algorithms for the 2D mesh and the Boolean n-cube are shown to map efficiently on the proposed architecture without loss of performance. The architecture is shown to be well suited for a number of problems in low and intermediate level computer vision such as the FFT, edge detection, template matching, and the Hough transform. Timing results for typical low and intermediate level vision algorithms on a transputer based prototype are presented  相似文献   

18.
针对基于图卷积的自编码器模型对原始图属性和拓扑信息的保留能力有限、无法学习结构和属性之间深度关联信息等问题,提出基于多通道图卷积自编码器的图表示学习模型。设计拓扑和属性信息保留能力实验,验证了基于图卷积的自编码器模型具备保留节点属性和拓扑结构信息的能力。构建特定信息卷积编码器和一致信息卷积编码器,提取图的属性空间特征、拓扑空间特征以及两者关联特征,生成属性嵌入、拓扑嵌入和一致性嵌入,同时建立与编码器对称的卷积解码器,还原编码器过程。使用重构损失、局部约束和一致性约束,优化各编码器生成的低维嵌入表示。最终将蕴含不同图信息的多种嵌入进行融合,生成各节点的嵌入表示。实验结果表明,该模型在BlogCatalog和Flickr数据集上节点分类的Micro-F1和Macro-F1明显高于基线模型,在Citeseer数据集上节点聚类的精度和归一化互信息相比于表现最优的基线模型提升了11.84%和34.03%。上述实验结果证明了该模型采用的多通道方式能够在低维嵌入中保留更丰富的图信息,提升图机器学习任务的性能表现。  相似文献   

19.
Knowledge graph (KG) embedding methods are at the basis of many KG-based data mining tasks, such as link prediction and node clustering. However, graphs may contain confidential information about people or organizations, which may be leaked via embeddings. Research recently studied how to apply differential privacy to a number of graphs (and KG) analyses, but embedding methods have not been considered so far. This study moves a step toward filling such a gap, by proposing the Differential Private Knowledge Graph Embedding (DPKGE) framework.DPKGE extends existing KG embedding methods (e.g., TransE, TransM, RESCAL, and DistMult) and processes KGs containing both confidential and unrestricted statements. The resulting embeddings protect the presence of any of the former statements in the embedding space using differential privacy. Our experiments identify the cases where DPKGE produces useful embeddings, by analyzing the training process and tasks executed on top of the resulting embeddings.  相似文献   

20.
为解决传统任务划分方法在三维网格并行计算任务分配阶段产生的通信开销大的问题,提出了一种基于多层k路划分算法的并行任务分配策略.首先利用多层k路划分算法划分三维网格,将任务划分问题转化为图划分问题,然后基于图划分结果给出一个任务映射并行算法将计算任务分配到各计算结点.在深腾1800上求解三维网格模型最短路径问题的实验结果表明,相比于传统的行列划分任务分配策略,该策略在保证负裁平衡的同时有效地降低了通信开销,算法的运行时间减少,加速比得到提高.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号