首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fault-tolerant communication algorithms for k-ary n-cubes are introduced. These include: One-to-all broadcasting, all-to-all broadcasting, one-to-all personalized communication, and all-to-all personalized communication. Each of these algorithms can tolerate up to (2n-2) node failures provided that k>(2n-2) and k>3. Extensions of these algorithms with up to 2n-1 node failures are also described. The communication complexities of the proposed algorithms are derived when wormhole or store and forward packet routing is used  相似文献   

2.
主要研究蜂窝环上的全广播路由算法.第一个全广播算法的设计思路是找到一条通过所有节点的路径,关键是确定边界上的一些特殊节点;第二个全广播算法应用了蜂窝环的哈密尔顿性质.假设一个有n个处理机的蜂窝环,前者每个节点有自己专用的路由策略,时间复杂度为3n,因为计算时间往往比数据传送时间低得多,所以总的通信时间可以降低到n;后者...  相似文献   

3.
Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on thegeneralized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spanning subgraphs with special properties to fit various communication needs are constructed on the network. The importance of these spanning subgraphs is demonstrated with the development of optimal algorithms for four fundamental communication problems, namely, theone-to-allandall-to-all broadcastingand theone-to-allandall-to-all scattering. Broadcastingis the distribution of the same group of messages from a source processor to all other processors, andscatteringis the distribution of distinct groups of messages from a source processor to each other processor. We consider broadcasting and scattering from a single processor of the network (one-to-all broadcasting and scattering) and simultaneously from all processors of the network (all-to-all broadcasting and scattering). For the all-to-all broadcasting and scattering algorithms, a special technique is developed on the generalized hypercube so that messages originating at individual nodes are interleaved in such a manner that no two messages contend for the same edge at any given time. The communication problems are studied under thestore-and-forward, all-portcommunication model. Lower bounds are derived for the above problems under the stated assumptions, in terms of time and number of message transmissions, and optimal algorithms are designed.  相似文献   

4.
Optimal broadcasting on the star graph   总被引:2,自引:0,他引:2  
The star graph has been show to be an attractive alternative to the widely used n-cube. Like the n-cube, the star graph possesses rich structure and symmetry as well as fault tolerant capabilities, but has a smaller diameter and degree. However, very few algorithms exists to show its potential as a multiprocessor interconnection network. Many fast and efficient parallel algorithms require broadcasting as a basic step. An optimal algorithm for one-to-all broadcasting in the star graph is proposed. The algorithm can broadcast a message to N processors in O(log2 N) time. The algorithm exploits the rich structure of the star graph and works by recursively partitioning the original star graph into smaller star graphs. In addition, an optimal all-to-all broadcasting algorithm is developed  相似文献   

5.
All-to-all communication patterns occur in many important parallel algorithms. This paper presents new algorithms for all-to-all communication patterns (all-to-all broadcast and all-to-all personalized exchange) for wormhole switched 2D/3D torus- and mesh-connected multiprocessors. The algorithms use message combining to minimize message start-ups at the expense of larger message sizes. The unique feature of these algorithms is that they are the first algorithms that we know of that operate in a bottom-up fashion rather than a recursive, top-down manner. For a 2d×2d torus or mesh, the algorithms for all-to-all personalized exchange have time complexity of O(23d). An important property of the algorithms is the O(d) time due to message start-ups, compared with O(2d) for current algorithms. This is particularly important for modern parallel architectures where the start-up cost of message transmissions still dominates, except for very large block sizes. Finally, the 2D algorithms for all-to-all personalized exchange are extended to O(24d) algorithms in a 2d×2d×2d3D torus or mesh. These algorithms also retain the important property of O(d) time due to message start-ups  相似文献   

6.
All-to-all personalized communication commonly occurs in many important parallel algorithms, such as FFT and matrix transpose. This paper presents new algorithms for all-to-all personalized communication or complete exchange in multidimensional torus- or mesh-connected multiprocessors. For an R×C torus or mesh where R⩽C, the proposed algorithms have time complexities of O(C) message startups and O(RC2) message transmissions. The algorithms for three- or higher-dimensional tori or meshes follow a similar structure. Unlike other existing message-combining algorithms in which the number of nodes in each dimension should be a power-of-two and square, the proposed algorithms accommodate non-power-of-two tori or meshes where the number of nodes in each dimension need not be power-of-two and square. In addition, destinations remain fixed over a larger number of steps in the proposed algorithms, thus making them amenable to optimizations. Finally, the data structures used are simple, hence making substantial savings of message-rearrangement time  相似文献   

7.
全交换在并行计算领域中有着大量而重要的应用,例如FFT和矩阵运算等.本文在由以太网交换机分层级联而成的机群系统上,提出了高性能的全交换算法DCE和算法MCCE.这两个算法充分利用了网络中瓶颈链路的带宽,达到了通信量的理论下限,并且运用多种策略来避免通信过程中的网络冲突,从而提高了机群的通信性能.实验结果表明,本文所述的算法在消息长度较长时,明显优于MPICH和LAM/MPI中实现的MPI-Alltoall算法.最后,该算法简单规范,易于实现.  相似文献   

8.
A practical interconnection network RP(k) and its routing algorithms   总被引:8,自引:0,他引:8  
Based on Petersen graph, a new interconnection network, the RP(k) network, is devel-oped and the properties of the RP(k) network are investigated. The diameter of the RP(k) network is [ k/2] + 2 and its degree is 5. We prove that the diameter of the RP(k) network is much smaller than that of the 2-D Torus network when the number of nodes in interconnection networks is less than or equal to 300. In order to analyze the communication performance in a group of nodes, we propose the concepts of the optimal node groups and the diameter of the optimal node groups. We also show that the diameter of the optimal node groups in the RP(k) network is less than that in the 2-D Torus net-work. Especially when the number of nodes in an optimal node group is between 6 and 100, the diam-eter of the optimal node groups in the RP(k) network is half of that in the 2-D Torus network. Further-more based on the RP(k) network we design a set of routing algorithms which are point-to-point rout-ing, permutation routing, one-to-al  相似文献   

9.
This paper considers the generation of the origin–destination (OD) matrix, basic data in any vehicle routing or traveling salesman problem. An OD matrix must be generated by calculating the shortest paths between some nodes. Candidate methods for this are repetitive use of one-to-all shortest path algorithms such as Dijkstra’s algorithm, use of all-to-all shortest path algorithms such as the Floyd–Warshall algorithm, and use of specifically designed some-to-some shortest path algorithms. This paper compares the performance of several shortest path algorithms in computing OD matrices on real road networks. Dijkstra’s algorithm with approximate bucket data structure performed the best for most of the networks tested. This paper also proposes a grouping-based algorithm for OD matrix generation. Although it is an approximation approach, it has several good characteristics: it can find the exact shortest distances for most OD pairs; it guarantees that all the calculated shortest path distance values have corresponding paths; the deviation of any distance from the corresponding true shortest distance is small; and its computation time is short.  相似文献   

10.
All-to-all personalized exchange is one of the most dense collective communication patterns and occurs in many important applications in parallel computing. Previous all-to-all personalized exchange algorithms were mainly developed for hypercube and mesh/torus networks. Although the algorithms for a hypercube may achieve optimal time complexity, the network suffers from unbounded node degrees and thus has poor scalability in terms of I/O port limitation in a processor. On the other hand, a mesh/torus has a constant node degree and better scalability in this aspect, but the all-to-all personalized exchange algorithms have higher time complexity. In this paper, we propose an alternative approach to efficient all-to-all personalized exchange by considering another important type of networks, multistage networks, for parallel computing systems. We present a new all-to-all personalized exchange algorithm for a class of unique-path, self-routable multistage networks. We first develop a generic method for decomposing all-to-all personalized exchange patterns into some permutations which are realizable in these networks, and then present a new all-to-all personalized exchange algorithm based on this method. The newly proposed algorithm has O(n) time complexity for an n×n network, which is optimal for all-to-all personalized exchange. By taking advantage of fast switch setting of self-routable switches and the property of a single input/output port per processor in a multistage network, we believe that a multistage network could be a better choice for implementing all-to-all personalized exchange due to its shorter communication latency and better scalability  相似文献   

11.
In non-contiguous allocation, a job request can be split into smaller parts that are allocated possibly non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. In this paper, a new non-contiguous processor allocation strategy, referred to as Greedy-Available-Busy-List, is suggested for the 2D mesh network. Request partitioning in our suggested strategy is based on the sub-meshes available for allocation. To evaluate the performance improvement achieved by our strategy and compare it against well-known existing non-contiguous and contiguous strategies, we conduct extensive simulation runs under the assumption of wormhole routing and three communication patterns, notably one-to-all, all-to-all and random. The results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of job turnaround time and system utilization.  相似文献   

12.
All-to-All personalized communication is a basic communication operation in a parallel computing environment.There are a lot of results appearing in literature.All these communication algorithms can be divided into two kinds:direct communication algorithm and indirect communication algorthm.The optimal dircet all-to-all communication algorithm on rings and 2-D tori does exist.But,for indirect all-to-all communication algorithms,there is a gap between the time complexity of the already existing algorithm and the lower bound,In this paper an efficient indirect algorithm for all-to-all communication on rings and 2-D square tori with bidirection channels is presented.The algorithms is faster than any previous indirect algorithms.The main items of the time complexity of the algorithm is 2^2/8 and p^3/2/8 on rings and 2-D tori respectively,both reaching the theoretical lower bound,where p is the number of processors.  相似文献   

13.
Topological properties of OTIS-networks   总被引:1,自引:0,他引:1  
We conduct a general study of the topological properties of optical transpose interconnection systems (OTIS). We first obtain their basic topological metrics of size, degree, shortest distance and diameter, and then we obtain results related to the recursive structure and efficient embedding of meshes, cubes, spanning trees and cycles. We also present minimal one-to-one routing and optimal broadcasting algorithms, and we show how to construct node-disjoint paths between any two nodes of an OTIS network. Recent studies have addressed only particular members of the general class of OTIS networks. In this paper, we present unified tools for obtaining the topological properties of an arbitrary OTIS network based on the properties of the corresponding factor network  相似文献   

14.
In this paper we present three broadcast algorithms and lower bounds on the three main components of the broadcast time for 2-dimensional torus networks (wrap-around meshes) that use synchronous circuit-switched routing. The first algorithm is based on a recursive tiling of a torus and is optimal in terms of both phases and intermediate switch settings when the start-up time to initiate message transmissions is the dominant cost. It is the first broadcast algorithm to match the lower bound of log5 N on number of phases (where N is the number of nodes). The second and third algorithms are hybrids which combine circuit-switching with the pipelining and arc-disjoint spanning trees techniques that are commonly used to speed up store-and-forward routing. When the propagation time of messages through the network is significant, our hybrid algorithms achieve close to optimal performance in terms of phases, intermediate switch settings, and total transmission time. They are the first algorithms to achieve this performance in terms of all three parameters simultaneously  相似文献   

15.
This paper addresses the one-to-all broadcasting problem and the one-to-many broadcasting problem, usually simply called broadcasting and multicasting, respectively. Broadcasting is the information dissemination problem in which a node of a network sends the same piece of information to all the other nodes. Multicasting is a partial broadcasting in the sense that only a subset of nodes forms the destination set. Both operations have many applications in parallel and distributed computing. In this paper, we study these problems in both line model, and cut-through model. The former assumes long distance calls between nonneighboring processors. The latter strengthens the line model by taking into account the use of a routing function. Long distance calls are possible in circuit-switched and wormhole-routed networks, and also in many networks supporting optical facilities. In the line model, it is well known that one can compute in polynomial time a [log2n]-round broadcast or multicast protocol for any arbitrary network. Unfortunately such a protocol is often inefficient from a practical point of view because it does not use the resources of the network in a balanced way. In this paper, we present a new algorithm to compute broadcast or multicast protocols. This algorithm applies under both line and cut-through models. Moreover, it returns protocols that efficiently use the bandwidth of the network. From a complexity point of view, we also show that most of the optimization problems relative to the maximization of the efficiency of broadcast or multicast protocols in terms of switching time or vertex load are NP-complete. We have, however, derived polynomial efficient solutions for tree-networks  相似文献   

16.
Multicast is an important collective communication in scalable parallel computers. One efficient scheme to perform multicast is multidestination messaging[8]. In multidestination messaging, destination nodes of a multicast are partitioned into disjoint groups. Nodes in each group are reached with a multidestination message that conforms to the base routing algorithm of the system. A systematic way of partitioning the nodes is critical to the efficiency of multidestination messaging. In this paper we propose a node grouping method, called turn grouping, for partitioning the destination nodes in a multicast. Turn grouping is general in the sense that it supports any base routing algorithm derivable from the turn model [5]. Given such a base routing algorithm and the corresponding prohibited turns, turn grouping can systematically produce a proper schedule for multicasting the message. We evaluated the performance of turn grouping using three typical turn model-based routing algorithms. The simulation results show that our approach performs better than the Umesh [12] and the Hamiltonian-path [8] algorithms.  相似文献   

17.
随着个性化推荐技术的发展,推荐系统面临着越来越多的挑战。传统的推荐算法通常存在数据稀疏性和推荐精度低等问题。针对以上问题,提出了一种融合时间隐语义填充和子群划分的推荐算法[K]-TLFM(Time Based Latent Factor Model Integrated with [k]-means)。该算法利用融合时间因素的隐语义模型对原始用户物品评分矩阵缺失项进行填充,避免了用全局平均值或者用户/物品平均值补全矩阵带来的误差,有效缓解了数据稀疏性问题,同时融合时间因素有效地刻画了用户偏好随时间的变化;完成评分矩阵缺失项填充后,基于二分[k]-means聚类算法将偏好、兴趣特征相似的对象划分到同一个子群中,在目标用户所属的子群中基于选定的协同过滤算法为用户产生推荐列表,提高了推荐效率和准确性。在MovieLens和Netflix数据集上对该算法的推荐性能进行了对比实验,结果表明该算法具有更高的推荐精度。  相似文献   

18.
We consider efficiently routing permutations in a class of switch-based interconnects. Permutation is an important communication pattern in parallel and distributed computing systems. We present a generic approach to realizing arbitrary permutations in a class of unique-path, self-routable interconnects. It is well-known that this type of interconnect has low hardware cost, but can realize only a small portion of all possible permutations between its inputs and outputs in a single pass. We consider routing arbitrary permutations with link-disjoint paths and node-disjoint paths in such interconnects in a minimum number of passes. In particular, routing with node-disjoint paths has important applications in emerging optical interconnects. We employ and further expand the Latin square technique used in all-to-all personalized exchange algorithms for this class of interconnects for general permutation routing. Our implementation of permutation routing is optimal in terms of the number of passes that messages are transmitted through the network, and it is near-optimal in network transmission time for sufficiently long messages. The possibility of adopting a single-stage interconnect is also discussed.  相似文献   

19.
In this paper we deal with the problem of designing virtual path layouts in ATM networks when the hop-count is given and the load has to be minimized. We first prove a lower bound for networks with arbitrary topology and arbitrary set of connection requests. This result is then applied to derive lower bounds for the following settings: (i) one-to-all (one node has to be connected to all other nodes of the network) in arbitrary networks; (ii) all-to-all (each node has to be connected to all other nodes in the network) in several classes of networks, including planar and k-separable networks and networks of bounded genus. We finally study the all-to-all setting on two-dimensional meshes and we design a virtual path layout for this problem. When the hop-count and the network degree are bounded by constants, our results show that the upper bounds proposed in this paper for the one-to-all problem in arbitrary networks and for the all-to-all problem in two-dimensional mesh networks are asymptotically optimal. Moreover, the general lower bound shows that the algorithm proposed in Gerstel (Ph.D. Thesis, Technion-Haifa, Israel, 1995) for the all-to-all problem in k-separable networks is also asymptotically optimal. The upper bound for mesh networks also shows that the lower bound presented in this paper for the all-to-all problem in planar networks is asymptotically tight.  相似文献   

20.
Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号