期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Towards scalable collective communication for multicomputer interconnection networks

A.Y. Al-Dubai M. Ould-Khaoua S. Al-Dobai 《Information Sciences》2004,163(4):293-306

A considerable number of broadcast algorithms have been proposed for the mesh over the past decade. Nonetheless, most of these algorithms do not exhibit good scalability properties as the network size increases. As a consequence, most existing broadcast algorithms cannot efficiently support real-world parallel applications that require large-scale system sizes due to their high computational demands. Motivated by these observations, this paper proposes the Nearest Side First Algorithm (or NSF for short) as a new adaptive broadcast algorithm for the mesh. One of the key results is that the performance of the NSF algorithm scales up well with the increase of processing elements, a feature not demonstrated by any previous broadcast algorithms, which enables the proposed algorithm to utilise massive parallel architectures with maximum effectiveness. 相似文献

2.

Constraint-based performance comparison of multi-dimensional interconnection networks with deterministic and adaptive routing strategies

Hamid Sarbazi-Azad^{Author Vitae} 《Computers & Electrical Engineering》2004,30(3):167-182

Several studies have examined the relative performance merits of the torus and hypercube taking into account the channel bandwidth constraints imposed by implementation technology. While the torus has been shown to outperform the hypercube under the constant wiring density constraint, the opposite conclusion has been reached when the constant pin-out constraint is considered. However, all these studies have assumed deterministic routing and have not taken into account the internal hardware cost of routers. This paper re-examines the performance merits of the torus and hypercube using both fully-adaptive and deterministic routing strategies. Moreover, it uses a new cost model which takes into account the internal hardware cost of routers. 相似文献

3.

An accurate mathematical performance model of adaptive routing in the star graph

A.E. H. M. 《Future Generation Computer Systems》2008,24(6):461-474

Analytical modelling is indeed the most cost-effective method to evaluate the performance of a system. Several analytical models have been proposed in the literature for different interconnection network systems. This paper proposes an accurate analytical model to predict message latency in wormhole-switched star graphs with fully adaptive routing. Although the focus of this research is on the star graph but the approach used for modelling can be, however, used for modelling some other regular and irregular interconnection networks. The results obtained from simulation experiments confirm that the proposed model exhibits a good accuracy for various network sizes and under different operating conditions. 相似文献

4.

Performance modeling of Cartesian product networks 总被引：1，自引：0，他引：1

Reza MoravejiAuthor Vitae Hamid Sarbazi-AzadAuthor Vitae Albert Y. ZomayaAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(1):105-113

This paper presents a comprehensive performance model for fully adaptive routing in wormhole-switched Cartesian product networks. Besides the generality of the model which makes it suitable to be used for any product graph, experimental (simulation) results show that the proposed model exhibits high accuracy even in heavy traffic and saturation region, where other models have severe problems to predict the performance of the network. Most popular interconnection network can be defined as a Cartesian product of two or more networks including the mesh, hypercube, and torus networks. Torus and mesh networks are the most popular topologies used in recent supercomputing parallel machines. They have been widely used for realizing on-chip network in recent on-chip multicore and multiprocessors system. 相似文献

5.

A new performance measure for characterizing fault rings in interconnection networks

F. Safaei A. Khonsari M.M. Gilak 《Information Sciences》2010,180(5):664-678

One of the fundamental issues in parallel computers is how to efficiently perform routing in a faulty network where each component fails with some probability. Adaptive fault-tolerant routing algorithms in such systems have been frequently suggested as a means of providing continuous operations in the presence of one or more failures by allowing the graceful system degradation. Many algorithms involve adding buffer space and complex control logic to the routing nodes. However, the addition of extra logic circuits and buffer space makes nodes more liable to failure and less reliable. Further, if the shape of fault pattern is confined, then many non-faulty nodes will be sacrificed and hence their resources are wasted. This is clearly an undesirable solution and motivates solutions that provoke efficient use of non-faulty nodes. One such approach to reducing the number of functional nodes that must be marked as faulty is based on the concept of fault rings to support more flexible routing around rectangular fault regions. Before such schemes can be successfully incorporated in networks, it is necessary to have a clear understanding of the factors that affect their performance potential. In this paper, we propose the first general solution for computing the probability of message facing the fault rings with and without overlapping in the well-known torus networks. We also conduct extensive simulation experiments using various fault patterns, the results of which are used to confirm the good accuracy of the proposed analytical models. 相似文献

6.

A new approach to model virtual channels in interconnection networks

N. Alzeidi A. Khonsari 《Journal of Computer and System Sciences》2007,73(8):1121-1130

Dealing with virtual channels has always been a critical issue in developing analytical performance models for interconnection networks. Almost all previous studies relied on a method proposed by Dally to capture the effect of virtual channels multiplexing in the performance of interconnection networks. This paper presents a new method to model the effect of virtual channel multiplexing in high-speed wormhole-switched interconnection networks. Dally's method loses its accuracy as the traffic load increases due to blocking nature of wormhole-switched networks. Our new method is based on a finite capacity queue, M/G/1/V and comparing to Dally's method achieves a higher degree of accuracy under low, moderate and high traffic loads. Furthermore, its simplicity eases its employment under different network conditions and setup. The presented model is validated by means of an event driven simulator and a detailed comparison with Dally's method is presented. 相似文献

7.

The impact of virtual channel allocation on the performance of deterministic wormhole-routed k-ary n-cubes

S. Loucif M. Ould-Khaoua 《Simulation Modelling Practice and Theory》2002,10(8)

Virtual channels yield significant improvement in the performance of wormhole-routed networks as they can greatly reduce message blocking over network resources. K-ary n-cubes with deterministic routing have been widely analysed using analytical modelling tools. Most existing models, however, have either entirely ignored the effects of virtual channel multiplexing or have not considered the impact of virtual channels allocation on message latency. This paper discusses two different organisations of virtual channels in k-ary n-cubes, resulting in two deterministic routing algorithms. It then proposes an analytical model to compute message latency for the two routing algorithms. The proposed model is used in a case study to demonstrate the sensitivity of network latency to the way virtual channels are allocated to messages. 相似文献

8.

Fault-tolerant wormhole routing for hypercube networks

Jau-Der Shih 《Information Processing Letters》2003,86(2):93-100

We present an adaptive fault-tolerant wormhole routing algorithm for hypercubes by using 3 virtual networks. The routing algorithm can tolerate at least n−1 faulty nodes and can route a message via a path of length no more than the shortest path plus four. Previous algorithms which achieve the same fault tolerant ability need 5 virtual networks. Simulation results are also given in this paper. 相似文献

9.

LDBR: Low-deflection bufferless router for cost-sensitive network-on-chip design

《Microprocessors and Microsystems》2014,38(7):669-680

In network-on-chip (NoC) designs, the bufferless router is more energy-efficient than the conventional router with buffers. However, in the bufferless network, deflections cause great performance loss. In this paper, three deflection models are firstly constructed for analyzing the causes of deflections. Then, we propose a low-deflection bufferless router (LDBR), in which a multi-channel network interface and a novel deflection routing based on turn model are designed for reducing the deflections during packet transmissions. Finally, LDBR is evaluated against the latest bufferless routers using synthetic and real-world traffic patterns. The experimental results exhibit that the deflection rate of LDBR network is reduced by 41% compared to other bufferless networks, and LDBR also shows superiority in cost and power consumption across all workloads. 相似文献

10.

Parallel routing algorithms for incomplete hypercube interconnection networks 总被引：1，自引：0，他引：1

M. S. Horng D. J. Chen Kuo-Lung Ku 《Parallel Computing》1994,20(12):1739-1761

Hypercube interconnection networks have been receiving considerable attention in the supercomputing environment. However, the number of processors must be exactly 2^r for an r-cube complete hypercube. This restriction severely limits its applicability. In this paper, we address three variant hypercube topologies with more flexibility in system sizes, the labelled hypercubes I_m^r, I_M^r, and I_A^r. Incomplete hypercube I_m^r consists of an r-cube and an m-cube complete hypercubes; I_m^r is composed of 2^r and Σ_{m ε M} 2^m nodes; I_Ar comes from an r-cube complete hypercube which operates in a degraded manner and allows that the missing nodes to be arbitrarily distributed. Specifically, we focus on the parallel paths routing algorithms for these three classes of incomplete hypercubes. Parallel paths between any given two nodes mean that these paths have the same source and destination nodes but with different intermediate nodes. Parallel communication is important as it will allow us to use the full bandwidth of the multiprocessors for the data transfer operation between any two nodes, and3these redundant paths can increase system fault-tolerance and communication reliability. With these parallel routing algorithms, one can use them as a criterion to design multiprocessor systems. 相似文献

11.

A fault-tolerant wormhole routing scheme for torus networks with nonconvex faults

Jau-Der Shih 《Information Processing Letters》2003,88(6):271-278

In this paper, we present a fault-tolerant routing algorithm for torus networks by using only 4 virtual channels. The proposed algorithm is based on the solid fault model, which includes rectangular faults and many practical nonconvex faults. Previous works need at least 6 virtual channels to achieve the same fault-tolerant ability. 相似文献

12.

SimuRed: A flit-level event-driven simulator for multicomputer network performance evaluation

Fernando Pardo Jose A. Boluda 《Computers & Electrical Engineering》2009,35(5):803-814

The interconnection network is one of the most important multicomputer components, since it has a great impact on global system performance. Many models and simulators have been proposed to evaluate network performance. This paper presents SimuRed, an event-driven flit-level, cycle-accurate simulator to evaluate different orthogonal network configurations. The core of the simulator has been designed to be expandable and portable to different situations. Some of the advantages of this simulator over other similar tools are its visual interface, its fast execution and its simplicity. Moreover, it is multiplatform and its source code versions (C++ and Java) are freely available under GNU open-source license. The performance of this simulator has been evaluated, including a performance impact study of injection channels and deterministic/adaptive routing for meshes and hypercubes. 相似文献

13.

实现无线传感器网络与IPv6网络互联的一种方案 总被引：1，自引：0，他引：1

王晓喃钱焕延唐振民《计算机应用》2009,29(4):1095-1098

提出并设计了无线传感器网络与IPv6网络无缝互联的通信模型。此模型提出了一种传感器地址自动配置方案,并在此方案之上实现了传感器节点的自动寻址路由。此外,给出了一种适用于无线传感器网络（WSN）的IPv6协议裁剪方案,以节省传感器节点的功耗。在实验平台及仿真环境中,实现了模型并且分析了模型的性能,实验结果证明了此模型的有效性和正确性。相似文献

14.

An analytical model of broadcast in QoS-aware wormhole-routed NoCs

Mahmoud Moadeli^{Author Vitae} Wim Vanderbauwhede Author Vitae 《Journal of Systems and Software》2011,84(1):12-20

Networks-on-Chip (NoC) emerged to address the technological and design issues related to development of large systems-on-chip (SoCs). Due to diversity of the application's performance requirements, most NoC architectures offer supports for quality of service (QoS). Also, to utilize the available bandwidth efficiently, they might implement mechanisms for delivering collective communication operations. This paper presents an analytical model to predict the average latency of wormhole-routed prioritized broadcast communication in NoCs. The model assumes that the network uses all-port routers scheme and offers differentiated services-based QoS. To verify the analysis, the model predictions are compared against the results obtained from a discrete-event simulator developed using OMNET++. 相似文献

15.

A performance model for analysis of heterogeneous multi-cluster systems 总被引：1，自引：0，他引：1

Bahman Javadi Mohammad K. Akbari Jemal H. Abawajy 《Parallel Computing》2006,32(11-12):831

This paper addresses the problem of performance modeling for large-scale heterogeneous distributed systems with emphases on multi-cluster computing systems. Since the overall performance of distributed systems is often depends on the effectiveness of its communication network, the study of the interconnection networks for these systems is very important. Performance modeling is required to avoid poorly chosen components and architectures as well as discovering a serious shortfall during system testing just prior to deployment time. However, the multiplicity of components and associated complexity make performance analysis of distributed computing systems a challenging task. To this end, we present an analytical performance model for the interconnection networks of heterogeneous multi-cluster systems. The analysis is based on a parametric family of fat-trees, the m-port n-tree, and a deterministic routing algorithm, which is proposed in this paper. The model is validated through comprehensive simulation, which demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions. 相似文献

16.

OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees

Jesus Escudero-Sahuquillo Pedro J. Garcia Francisco J. Quiles Jose Flich Jose DuatoAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1460-1472

High-speed interconnection networks are essential elements for different high-performance parallel-computing systems. One of the most common interconnection network topologies is the fat-tree, whose advantages have turned it into the favorite topology of many interconnect designers. One of these advantages is the possibility of using simple but efficient routing algorithms, like the recently proposed deterministic routing algorithm referred to as DET, which offers similar (or better) performance than Adaptive Routing while reducing complexity and guaranteeing in-order packet delivery. However, as other deterministic routing proposals, DET cannot react when packets intensely contend for network resources, leading to the appearance of Head-of-Line (HoL) blocking which spoils network performance. In this paper, we describe and evaluate a simple queue scheme that efficiently reduces HoL-blocking in fat-trees using the DET routing algorithm, without significantly increasing switch complexity and required silicon area. Additionally, we propose an implementation of OBQA in a feasible switch architecture. 相似文献

17.

面向温度优化的片上网络任务调度方法

吉慧周磊《计算机工程与科学》2018,40(9):1527-1533

随着片上网络规模的扩大和研究的逐步深入,如何将芯片上众多的任务进行合理的调度成为系统温度优化的关键之一。针对片上网络任务调度问题, 提出一种基于最短曼哈顿距离的任务调度SMDS方案。该策略充分考虑核通信图中通信节点对之间最短曼哈顿路径,通过搜索算法寻找任务调度的目的节点,使用模拟退火算法确定任务调度对。实验结果显示,与传统的分布式任务调度 DTM策略相比,针对6*6、8*8和10*10的拓扑结构,SMDS实验方案在迁移次数方面的平均优化率分别为2208%、21.74%和23.02%。在平均跳数方面的平均优化率分别为24.04%、29.18%和23.04%,实现了系统温度优化。相似文献

18.

Adaptive routing in wormhole-switched necklace-cubes: Analytical modelling and performance comparison

Sina Meraji Hamid Sarbazi-Azad 《Simulation Modelling Practice and Theory》2009,17(9):1522-1532

The necklace hypercube has recently been introduced as an attractive alternative to the well-known hypercube. Previous research on this network topology has mainly focused on topological properties, VLSI and algorithmic aspects of this network. Several analytical models have been proposed in the literature for different interconnection networks, as the most cost-effective tools to evaluate the performance merits of such systems. This paper proposes an analytical performance model to predict message latency in wormhole-switched necklace hypercube interconnection networks with fully adaptive routing. The analysis focuses on a fully adaptive routing algorithm which has been shown to be the most effective for necklace hypercube networks. The results obtained from simulation experiments confirm that the proposed model exhibits a good accuracy under different operating conditions. 相似文献

19.

TTPM – An efficient deadlock-free algorithm for multicast communication in 2D torus networks

M.G. A.A. M.A. K. 《Journal of Systems Architecture》2008,54(10):919-928

A torus network has become increasingly important to multicomputer design because of its many features including scalability, low bandwidth and fixed degree of nodes. A multicast communication is a significant operation in multicomputer systems and can be used to support several other collective communication operations. This paper presents an efficient algorithm, TTPM, to find a deadlock-free multicast wormhole routing in two-dimensional torus parallel machines. The introduced algorithm is designed such that messages can be sent to any number of destinations within two start-up communication phases; hence the name Torus Two Phase Multicast (TTPM) algorithm. An efficient routing function is developed and used as a basis for the introduced algorithm. Also, TTPM allows some intermediate nodes that are not in the destination set to perform multicast functions. This feature allows flexibility in multicast path selection and therefore improves the performance. Performance results of a simulation study on torus networks are discussed to compare TTPM algorithm with a previous algorithm. 相似文献

20.

The pipeline bus: An interconnection network for multiprocessor systems

Bernd Franke Ralf Harneit Axel Kern Hans Christoph Zeidler 《Parallel Computing》1988,7(3):403-412

The design of a network switch for synchronously clocked packet switching networks is presented. The switch includes the node interface and logic handling of the arbitration and routing for a large class of network topologies, namely n-dimensional rectangular grids including hypercubes and other highly efficient topologies. In the context of the SUPRENUM project the paper concentrates on two-dimensional meshes. Routing, arbitration, blocking, and fault tolerance issues are discussed. 相似文献