共查询到20条相似文献,搜索用时 15 毫秒
1.
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures 总被引:2,自引:2,他引:0
Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip.
Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs
for several reasons, among others, better levels of performance, scalability, and performance/energy ratio. On the other hand,
higher clock frequencies and increasing number of transistors available on a single chip have revealed energy consumption
as a critical design issue in current and future microarchitectures. In these architectures, the design of the on-chip interconnection
network has proven to have significant impact on overall system performance and energy consumption, and that the wires used
in such interconnect can be designed with varying latency, bandwidth, and power characteristics.
In this work, we present a detailed characterization of the energy-efficiency of a CMP for parallel scientific applications
using Sim-PowerCMP, a detailed architectural-level power-performance simulation tool for CMP architectures that integrates several well-known
contemporary simulators (RSIM, Hot Leakage and Orion) into a single framework that allows precise analysis and optimization
of power dissipation (both dynamic and static) taking into account performance. In this characterization, we pay special attention
to the energy consumed on the interconnection network. Results for an 8- and 16-core CMP show that the most power consuming
messages are the replies that carry data (almost 70% on average of the total energy consumed in the interconnect) although
they represent 30% of the total number of messages. Furthermore, we show that using on-chip wires with varying latency, bandwidth,
and energy characteristics can reduce the energy dissipated by the links of the interconnection network about 65% with an
average impact of 10% in the execution time.
相似文献
Manuel E. AcacioEmail: |
2.
In this paper, we present an incremental design of scalable interconnection networks in multicomputer systems using basic building blocks. Both network topologies and routing algorithms are considered. We use wormhole-routed small-scale 2D meshes as basic building blocks. The minimum requirement to expand these networks is a single building block. This implies that the network does not have to maintain the regular 2D mesh topology. Some new topologies are introduced: incomplete meshes based on those adaptive routing algorithms designed from the turn model and extended incomplete meshes based on XY routing. We show that the original routing algorithm can be adopted to send a message between any source and destination without using store-and-forward and causing deadlock. The way that the network is constructed incrementally requires no or a very small amount of rewiring and keeps high bisection density and short diameter of the network. The design methods can be used to economically and incrementally build expandable and scalable parallel computers. 相似文献
3.
Topology optimization of interconnection networks 总被引:2,自引:0,他引:2
This paper describes an automatic optimization tool that searches a family of network topologies to select the topology that best achieves a specified set of design goals while satisfying specified packaging constraints. Our tool uses a model of signaling technology that relates bandwidth, cost and distance of links. This model captures the distance-dependent bandwidth of modern high-speed electrical links and the cost differential between electrical and optical links. Using our optimization tool, we explore the design space of hybrid Clos-torus (C-T) networks. For a representative set of packaging constraints we determine the optimal hybrid C-T topology to minimize cost and the optimal C-T topology to minimize latency for various packet lengths. We then use the tool to measure the sensitivity of the optimal topology to several important packaging constraints such as pin count and critical distance. 相似文献
4.
A novel low-swing interface circuit for high-speed on-chip asynchronous interconnection is proposed in this paper. It takes a differential level-triggered latch to recover digital signal with ultra low-swing voltage less than 50 mV, and the driver part of the interface circuit is optimized for low power using the driver-array method, With a capacity to work up to 500 MHz, the proposed circuit, which is simulated and fabricated using SMIC 0.18-pm 1.8-V digital CMOS technology, consumes less power than previously reported designs. 相似文献
5.
6.
The interconnection network equivalence notions reported in the literature are formalized via conjugation maps over the sets of interconnections of such networks. Various forms of relations including group isomorphisms among interconnection networks are introduced. Equivalence relations express the degrees of freedom in “making one network behave like another.” Examples of these relations for commutative cube-connected networks with individual stage control are also included. In addition, an algorithm is provided to construct equivalence maps among such networks. 相似文献
7.
We study the cross product as a method for generating and analyzing interconnection network topologies for multiprocessor systems. Consider two interconnection graphs G1 and G2 each with some established properties such as symmetry, low degree and diameter, scalability, simple optimal routing, recursive structure (partitionability), fault tolerance, existence of node-disjoint paths, low cost embedding, and efficient broadcasting. We investigate and evaluate the corresponding properties for the cross product of G1 and G2 based on the properties of G1 and those of G2. We also give a mathematical characterization of product families of graphs which are closed under the cross product operation. This investigation is useful in two ways. On one hand, it gives a new tool for further studying some of the known interconnection topologies, such as the hypercube and the mesh, which can be defined using the cross product operation. On the other hand, it can be used in defining and evaluating new interconnection graphs using the cross product operation on known topologies 相似文献
8.
R. Lashevsky Y. Sato 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(6):462-469
Hardware implementation of artificial neural networks (ANN) based on MOS transistors with floating gate (Neuron MOS or νMOS)
is discussed. Choosing analog approach as a weight storage rather than digital improves learning accuracy, minimizes chip
area and power dissipation. However, since weight value can be represented by any voltage in the range of supplied voltage
(e.g. from 0 to 3.3 V), minimum difference of two values is very small, especially in the case of using neuron with large
sum of weights. This implies that ANN using analog hardware approach is weak against V
dd
deviation. The purpose of this paper is to investigate main parts of analog ANN circuits (synapse and neuron) that can compensate
all kinds of deviation and to develop their design methodologies. 相似文献
9.
Central to all parallel architectures is a switching network which facilitates the communication between a machine's components necessary to support their cooperation. Multistage interconnection networks (MINs) are classified and analytic models are described for both packet-switched and circuit-switched MINs with asynchronous transmission mode. Under strong enough assumptions, packet switching can be modeled by standard queuing methods, hence providing a standard against which to assess approximate models. We describe one such approximate model with much weaker assumptions which is more widely applicable and can be implemented more efficiently. To model circuit switching requires a different approach because of the presence of passive resources, namely multiple links through the MIN which must be held before a message can be transmitted and throughout its transmission. An approximate analysis based upon the recursive structure of a particular MIN topology which yields accurate predictions when compared with simulation is described. 相似文献
10.
《国际计算机数学杂志》2012,89(4):455-462
It is important that a communication service has to service dependability by high level. Many affairs cause failures in a network. Destroying nodes or links in communication network, cable cuts, node interruptions, software errors or hardware failures and transmission failure at various points, human error or accident and can interrupt service for long periods of time. At the beginning a communication network, requiring greater degree of stability or less vulnerability. In this work, various stability measures of a communication network are defined and the stability measures of some static interconnection networks which are known long times and w-star networks that are a new graph class, are given. 相似文献
11.
12.
The star networks,which were originally proposed by Akers and Harel,have suffered from a rigorous restriction on the number of nodes.The general incomplete star networks(GISN) are proposed in this paper to relieve this restriction.An efficient labeling scheme for GISN is given,and routing and broadcasting algorithms are also presented for GIS.The communication diameter of GISN is shown to be bounded by 4n-7.The proposed single node broadcasting algorithm is optimal with respect to time complexity O(nlog2n). 相似文献
13.
《Journal of Systems Architecture》2004,50(9):563-574
Several researchers have analysed the performance of k-ary n-cubes taking into account channel bandwidth constraints imposed by implementation technology, namely the constant wiring density and pin-out constraints for VLSI and multiple-chip technology respectively. For instance, Dally [IEEE Trans. Comput. 39(6) (1990) 775], Abraham [Issues in the architecture of direct interconnection networks schemes for multiprocessors, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1992], and Agrawal [IEEE Trans. Parallel Distributed Syst. 2(4) (1991) 398] have shown that low-dimensional k-ary n-cubes (known as tori) outperform their high-dimensional counterparts (known as hypercubes) under the constant wiring density constraint. However, Abraham and Agrawal have arrived at an opposite conclusion when they considered the constant pin-out constraint. Most of these analyses have assumed deterministic routing, where a message always uses the same network path between a given pair of nodes. More recent multicomputers have incorporated adaptive routing to improve performance. This paper re-examines the relative performance merits of the torus and hypercube in the context of adaptive routing. Our analysis reveals that the torus manages to exploit its wider channels under light traffic. As traffic increases, however, the hypercube can provide better performance than the torus. Our conclusion under the constant wiring density constraint is different from that of the works mentioned above because adaptive routing enables the hypercube to exploit its richer connectivity to reduce message blocking. 相似文献
14.
High-performance supercomputers generally comprise millions of CPUs in which interconnection networks play an important role to achieve high performance. New design paradigms of dynamic on-chip interconnection network involve a) topology b) synthesis, modeling and evaluation c) quality of service, fault tolerance and reliability d) routing procedures. To construct a dynamic highly fault tolerant interconnection networks requires more disjoint paths from each source-destination node pair at each stage and dynamic rerouting capability to use the various available paths effectively. Fast routing and rerouting strategy is needed to provide reliable performance on switch/link failures. This paper proposes two new architecture designs of fault tolerant interconnection networks named as reliable interconnection networks (RIN-1 and RIN-2). The proposed layouts are multipath multi-stage interconnection networks providing four disjoint paths for all the source-destination node pairs with dynamic rerouting capability. The designs can withstand switch failures in all the stages (including input and output stages) and provide more reliability. Reliability analysis of various MIN architectures is evaluated. On comparing the results with some existing MINs it is evident that the proposed designs provides higher reliability values and fault tolerance. 相似文献
15.
As the number of cores integrated onto a single chip increases, power dissipation and network latency become ever-increasingly stringent. On-chip network provides an efficient and scalable interconnection paradigm for chip multiprocessors (CMPs), wherein one-to-many (multicast) communication is universal for such platforms. Without efficient multicasting support, traditional unicasting on-chip networks will be low efficiency in tackling such multicast communication. In this paper, we propose Dual Partitioning Multicasting (DPM) to reduce packet latency and balance network resource utilization. Specifically, DPM scheme adaptively makes routing decisions based on the network load-balance level as well as the link sharing patterns characterized by the distribution of the multicasting destinations. Extensive experimental results for synthetic traffic as well as real applications show that compared with the recently proposed RPM scheme, DPM significantly reduces the average packet latency and mitigates the network power consumption. More importantly, DPM is highly scalable for future on-chip networks with heavy traffic load and varieties of traffic patterns. 相似文献
16.
《计算机工程与科学》2017,(10):1781-1787
随着对高性能计算机性能需求的不断提升,高性能计算机的系统规模在逐渐扩大,系统内的互连网络已经成为影响性能的关键因素。如何基于高阶路由器构建更大规模、更低网络延迟以及成本、更高网络吞吐率的互连网络,是目前的主要研究方向。针对目前广泛应用的高阶网络进行特性分析,并对其中的环网以及树网进行综合,提出了一种新型层次化混合互连网络拓扑结构。该结构具有良好的可扩展性以及通信能力,并在网络模拟器NetSim上对其性能进行了仿真和分析。 相似文献
17.
18.
The flow-control mechanism determinates the manner in which the communicational resources are allocated. Well-designed flow-control mechanism should provide efficient allocation of the communicational resources in wide variety of interconnection networks. The goal of this paper is to suggest a highly effective “Step-Back-on-Blocking” buffered flow control. The proposed flow-control mechanism combines the advantages of the Wormhole and Virtual-Cut Through flow controls, whilst adds a means for adaptive allocation of the communicational resources. The “Step-Back-on-Blocking” flow control provides low message latency and achieves high fraction of the channel bandwidth by performing conditional evasion of temporary blocked network resources. The effectiveness of the proposed flow control has been evaluated on the basis of numerous experiments conducted in OMNet++ discrete event simulation environment. 相似文献
19.
20.
A taxonomy for characterizing adaptive routing protocols for hypercube interconnection networks (HINs) is presented. The taxonomy is based on classes of routing decisions common to any HIN. This taxonomy is used to discuss existing and proposed protocols. Rather than an exhaustive enumeration of related research, the protocols selected for discussion are intended to be representative of the classes defined by the taxonomy. These protocols are candidates for use in massively parallel architectures configured with HINs. To provide some insight into their behavior in very large HINs, results of simulation studies of representative protocols are presented 相似文献