期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Thermal management in 3d networks-on-chip using dynamic link sharing

《Microprocessors and Microsystems》2017

3D integration is a practical solution for overcoming the problems of long and slow global wires in current and future generations of integrated circuits. This emerging technology stacks several die slices on top of each other in a single chip. It provides higher-bandwidth and lower-latency in the third dimension than a 2D design due to extremely shorter inter-layer distances. However, thermal challenges are a key impediment to stacking logic dies on top of each other. Particularly, routers in a 3D network-on-chip (NoC) are a main source of thermal hotspots, limiting the potential performance gains of the 3D integration. In this paper, we take advantage of the low-latency 3D vertical links to design a temperature-aware router architecture for 3D NoCs. This architecture reduces the peak temperature of routers, particularly routers that are farther from the heat sink, by balancing the traffic across all layers in a temperature-aware distributed way. This way, a router with high temperature can borrow the link and crossbar bandwidth of the routers in the layers closer to the heat sink to forward its packets, effectively offloading part of its traffic to them to reduce its temperature.Experimental results show that the proposed method can control the temperature of 3D NoCs and reduce the temperature gradient across the network with minimized negative impact on performance, compared to a state-of-the-art 3D NoC temperature management method. 相似文献

2.

Deadlock-free generic routing algorithms for 3-dimensional Networks-on-Chip with reduced vertical link density topologies

《Journal of Systems Architecture》2013,59(7):528-542

3-Dimensional Networks-on-Chip (3D NoC) have emerged as the promising solution for scalability, power consumption and performance demands of next generation Systems-on-Chip (SoCs) interconnect. Due to the cost in terms of thermal, yield, chip area and design complexity, minimizing the number of Through-Silicon-Via (TSVs) in 3D ICs has become on the most important design issues. In this paper, we will present several stable, simple and deadlock-free generic routing algorithms for 3D NoCs with different reduced vertical link density topologies, which can maintain the 3D NoCs performance and save the system cost (TSV number, chip area, system power, etc.). The experimental results have been extracted from our cycle-accurate GSNOC simulator and have shown that our routing algorithms can maintain the system performance up to reducing 50% of TSVs number in comparison to the 100% TSVs number with ZXY routing algorithm configuration. 相似文献

3.

An Efficient Network-on-Chip Router for Dataflow Architecture

下载免费PDF全文

Xiao-Wei Shen Xiao-Chun Ye Xu Tan Da Wang Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Dong-Rui Fan Ning-Hui Sun 《计算机科学技术学报》2017,32(1):11-25

Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router. 相似文献

4.

Area and power savings via asymmetric organization of buffers in 3D-NoCs for heterogeneous 3D-SoCs

《Microprocessors and Microsystems》2017

In this paper we investigate the effects of asymmetric organization and depths of Network-on-Chip (NoC) router buffers among dies in heterogeneous 3D-System-on-Chips (SoCs). In our novel approach the properties of the routers are aligned with the characteristics of the technological nodes per layer. We call these designs Asymmetric 3D-NoCs (A-3D-NoCs). In this work we demonstrate potentials of A-3D-NoCs in comparison to a conventional, symmetric 3D-NoC: Applying asymmetric buffer reorganization we achieve area savings of 8.3% and power savings of 5.4% for link buffers while accepting a minor average system performance loss of 2.1%. With additional asymmetry in buffer depth up to 28% cost savings and 15% power reduction are given in combination with a 4.6% performance decline. Thus, the proposed buffer organization scheme is applicable for cost and power critical applications of NoCs in heterogeneous 3D-SoCs. 相似文献

5.

片上网络路由器及互连低成本测试方法

向东《集成技术》2013,2(6):1-7

三维设计的片上网络(Network-on-chip)是当前的热点研究专题。提出一种有效的片上网络路由器和互连测试方法显得非常重要。文章通过对路由器分类,提出了一种新的片上网络路由器测试方法。将不同输入、输出端口的路由器分为不同的类。相同分类的路由器是同构的,它们的测试集也是相同的。针对相同路由器提出了一个基于单播的多播方案 (Unicast-based Multicast),并提出了新的互连测试方法。实验结果表明文章方法是有效的。相似文献

6.

A High-Throughput Distributed Shared-Buffer NoC Router 总被引：1，自引：0，他引：1

Soteriou V. Ramanujam R.S. Lin B. Li-Shiuan Peh 《Computer Architecture Letters》2009,8(1):21-24

Microarchitectural configurations of buffers in routers have a significant impact on the overall performance of an on-chip network (NoC). This buffering can be at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they have higher throughput and lower queuing delays under high loads than IBRs. However, a direct implementation of OBRs requires a router speedup equal to the number of ports, making such a design prohibitive given the aggressive clocking and power budgets of most NoC applications. In this letter, we propose a new router design that aims to emulate an OBR practically based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Our DSB design can achieve significantly higher bandwidth at saturation, with an improvement of up to 20% when compared to a state-of-the-art pipelined IBR with the same amount of buffering, and our proposed microarchitecture can achieve up to 94% of the ideal saturation throughput. 相似文献

7.

Asynchronous spatial division multiplexing router

Wei Song^{Author Vitae} Doug EdwardsAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):85-97

Asynchronous quasi-delay-insensitive (QDI) NoCs have several advantages over their clocked counterparts. Virtual channel (VC) is the most utilized flow control method in asynchronous routers but spatial division multiplexing (SDM) achieves better throughput performance for best-effort traffic than VC. A novel asynchronous SDM router architecture is presented. Area and latency models are provided to analyse the network performance of all router architectures including wormhole, virtual channel and SDM. Performance comparisons have been made with different configurations of payload size, communication distance, buffer size, port bandwidth, network size and number of VCs/virtual circuits. Compared with VC, SDM achieves higher throughput with lower area overhead. 相似文献

8.

Energy-aware routing in hybrid optical network-on-chip for future multi-processor system-on-chip

Lin Liu Yuanyuan Yang 《Journal of Parallel and Distributed Computing》2013

With the development of Multi-Processor System-on-Chip (MPSoC) in recent years, the intra-chip communication is becoming the bottleneck of the whole system. Current electronic network-on-chip (NoC) designs face serious challenges, such as bandwidth, latency and power consumption. Optical interconnection networks are a promising technology to overcome these problems. In this paper, we study the routing problem in optical NoCs with arbitrary network topologies. Traditionally, a minimum hop count routing policy is employed for electronic NoCs, as it minimizes both power consumption and latency. However, due to the special architecture of current optical NoC routers, such a minimum-hop path may not be energy-wise optimal. Using a detailed model of optical routers we reduce the energy-aware routing problem into a shortest-path problem, which can then be solved using one of the many well known techniques. By applying our approach to different popular topologies, we show that the energy consumed in data communication in an optical NoC can be significantly reduced. We also propose the use of optical burst switching (OBS) in optical NoCs to reduce control overhead, as well as an adaptive routing mechanism to reduce energy consumption without introducing extra latency. Our simulation results demonstrate the effectiveness of the proposed algorithms. 相似文献

9.

A framework for rapid evaluation of heterogeneous 3-D NoC architectures

Efstathios Sotiriou-XanthopoulosAuthor Vitae Dionysios DiamantopoulosAuthor VitaeKostas Siozios George EconomakosAuthor VitaeDimitrios SoudrisAuthor Vitae 《Microprocessors and Microsystems》2014

The scalability of communication infrastructure in modern Integrated Circuits (ICs) becomes a challenging issue, which might be a significant bottleneck if not carefully addressed. Towards this direction, the usage of Networks-on-Chip (NoC) is a preferred solution. In this work, we propose a software-supported framework for quantifying the efficiency of heterogeneous 3-D NoC architectures. In contrast to existing approaches for NoC design, the introduced heterogeneous architecture consists of a mixture of 2-D and 3-D routers, which reduces the delay and power consumption with a slight impact on packet hops. More specifically, the experimental results with a number of DSP applications show the effectiveness of the introduced methodology, as we achieve on average 25% higher maximum operation frequency and 39% lower power consumption compared to the uniform 3-D NoCs. 相似文献

10.

RAFT: A router architecture with frequency tuning for on-chip networks

Asit K. MishraAuthor Vitae Aditya Yanamandra^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(5):625-640

With increasing number of cores being integrated on a single die, Network-on-Chips (NoCs) have become the de-facto standard in providing scalable communication backbones for these multi-core chips. NoCs have a significant impact on the system’s performance, power and reliability. However, NoCs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly. Towards this end, this paper proposes a novel router architecture, where we tune the frequency of a router in response to network load to manage both performance and power. We propose three dynamic frequency tuning techniques, FreqBoost, FreqThrtl and FreqTune, targeted at congestion and power management in NoCs. We also propose and evaluate a novel fine-grained frequency tuning scheme where we vary the number of virtual-channels in a router dynamically. As a further optimization to these schemes, we propose a frequency tuning scheme where we tune the frequency of the four ports of a mesh router separately from the local port. As enablers for these techniques, we exploit Dynamic Voltage and Frequency Scaling (DVFS) and the imbalance in a generic router pipeline through time stealing. We also evaluate and analyze the proposed schemes from the point of view of reliability against soft error vulnerability and provide guidelines in choosing the appropriate scheme when reliability is the prime design constraint.Experiments using synthetic workloads on an 8 × 8 wormhole-switched mesh interconnect show that FreqBoost is a better choice for reducing average latency (maximum 40%) while, FreqThrtl provides the maximum benefits in terms of power saving and energy delay product (EDP). The FreqTune scheme is a better candidate for optimizing both performance and power, achieving on an average 36% reduction in latency, 13% savings in power (up to 24% at high load), and 40% savings (up to 70% at high load) in EDP. With application benchmarks, we observe IPC improvement up to 23% using our design. Our analysis shows FreqBoost to be the most robust scheme amongst the three schemes when reliability is a concern. 相似文献

11.

A single-cycle output buffered router with layered switching for Networks-on-Chips

Yancang ChenAuthor Vitae Zhonghai Lu^{Author Vitae} 《Computers & Electrical Engineering》2012,38(4):906-916

We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity. 相似文献

12.

片上网络中面向报文有序传输的自适应路由算法研究

姜燕张广艳《计算机应用研究》2016,33(4)

与确定性路由算法相比,自适应路由算法可以提高片上网络的通信性能,但是报文可能会无序到达。在目的节点对报文排序将会导致严重的面积和计算开销,甚至可能会抵消采用自适应路由算法带来的性能增益。为此,本文首先提出一种部分自适应路由算法,以满足报文的有序到达。然后,描述了对本文算法提供支持的路由器硬件结构。最后,在二维片上网络下对本文算法及确定性和自适应路由算法进行了性能评估和比较。与XY算法相比,本文算法显著降低了报文延时,提升了饱和点。同时讨论了对路由器面积和功耗影响。虽然路由器的功耗有所上升,但是由于报文交付性能提升,因此每个flit的能耗增长可忽略不计。相似文献

13.

MorphoNoC: Exploring the design space of a configurable hybrid NoC using nanophotonics

《Microprocessors and Microsystems》2017

As diminishing feature sizes drive down the energy for computations, the power budget for on-chip communication is steadily rising. Furthermore, the increasing number of cores is placing a huge performance burden on the network-on-chip (NoC) infrastructure. While NoCs are designed as regular architectures that allow scaling to hundreds of cores, the lack of a flexible topology gives rise to higher latencies, lower throughput, and increased energy costs. In this paper, we explore MorphoNoCs - scalable, configurable, hybrid NoCs obtained by extending regular electrical networks with configurable nanophotonic links. In order to design MorphoNoCs, we first carry out a detailed study of the design space for Multi-Write Multi-Read (MWMR) nanophotonics links. After identifying optimum design points, we then discuss the router architecture for deploying them in hybrid electronic-photonic NoCs. We then study the design space at the network level, by varying the waveguide lengths and the number of hybrid routers. This affords us to carry out energy-latency trade-offs. For our evaluations, we adopt traces from synthetic benchmarks as well as the NAS Parallel Benchmark suite. Our results indicate that MorphoNoCs can achieve latency improvements of up to 3.0× or energy improvements of up to 1.37× over the base electronic network. 相似文献

14.

Minimizing the system impact of router faults by means of reconfiguration and adaptive routing

《Microprocessors and Microsystems》2017

To tolerate faults in Networks-on-Chip (NoC), routers are often disconnected from the NoC, which affects the system integrity. This is because cores connected to the disabled routers cannot be accessed from the network, resulting in loss of function and performance. We propose E-Rescuer, a technique offering a reconfigurable router architecture and a fault-tolerant routing algorithm. By taking advantage of bypassing channels, the reconfigurable router architecture maintains the connection between the cores and the network regardless of the router status. The routing algorithm allows the core to access the network when the local router is disabled.Our analysis and experiments show that the proposed technique provides 100% packet delivery in 100%, 92.56%, and 83.25% of patterns when 1, 2 and 3 routers are faulty, respectively. Moreover, the throughput increases up to 80%, 46% and 33% in comparison with FTLR, HiPFaR, and CoreRescuer, respectively. 相似文献

15.

Adaptive inter-layer message routing in 3D networks-on-chip

Claudia RusuAuthor Vitae Lorena AnghelAuthor Vitae 《Microprocessors and Microsystems》2011,35(7):613-631

Existing routing algorithms for 3D deal with regular mesh/torus 3D topologies. Today 3D NoCs are quite irregular, especially those with heterogeneous layers. In this paper, we present a routing algorithm targeting 3D networks-on-chip (NoCs) with incomplete sets of vertical links between adjacent layers. The routing algorithm tolerates multiple link and node failures, in the case of absence of NoC partitioning. In addition, it deals with congestion. The routing algorithm for 3D NoCs preserves the deadlock-free propriety of the chosen 2D routing algorithms. It is also scalable and supports a local reconfiguration that complements the reconfiguration of the 2D routing algorithms in case of failures of nodes or links. The algorithm incurs a small overhead in terms of exchanged messages for reconfiguration and does not introduce significant additional complexity in the routers. Theoretical analysis of the 3D routing algorithm is provided and validated by simulations for different traffic loads and failure rates. 相似文献

16.

The Adaptive Bubble Router

《Journal of Parallel and Distributed Computing》2001,61(9):1180-1208

The design of a new adaptive virtual cut-through router for torus networks is presented in this paper. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels. This has been achieved by combining a low-cost deadlock avoidance mechanism for virtual cut-through networks, called Bubble flow control, with an adequate design of the router's arbiter. A thorough methodology has been employed to quantify the impact that this router design has at all levels, from its hardware cost to the system performance when running parallel applications. At the VLSI level, our proposal is the adaptive router with the shortest clock cycle and node delay when compared with other state-of-the-art alternatives. This translates into the lowest latency and highest throughput under standard synthetic loads. At system level, these gains reduce the execution time of the benchmarks considered. Compared with current adaptive wormhole routers, the execution time is reduced by up to 27%. Furthermore, this is the only router that improves system performance when compared with simpler static designs. 相似文献

17.

一种适用于2D Mesh片上网络的可重构容错路由算法

石泽文曾晓洋虞志益《小型微型计算机系统》2012,33(1):178-182

适用于2D Mesh片上网络的可重构容错路由算法,在芯片某些区域由于制造缺陷、使用老化等原因拓扑结构变得不再规整的时候,可以对网络节点重新进行配置,从而保证健康节点间的正常通信.基于SystemC的平台仿真表明该算法相对于传统算法可以获得更佳的网络性能.该算法是免于死锁的,同时对其可重构机制也给出了详细的论述.它还具有良好的扩展性,当系统规模增大的时候每个路由器的硬件开销保持恒定,而其容错能力也得到了增强. 相似文献

18.

转发控制分离的IPv6路由器控制体系结构研究

下载免费PDF全文

涂睿苏金树《计算机工程与科学》2006,28(4):10-12

路由器体系结构是决定路由器性能和功能的主导因素。一方面，路由器的功能需求越来越复杂；另一方面，为了满足转发性能需求，普遍采用硬件进行转发。为此，转发控制分离成为一种全新的路由器体系结构。本文在研究分析转发控制分离体系结构的基础上，提出了一种针对IPv6的转发控制分离路由器控制体系结构，并重点研究了转发与控制制分离后所带来的问题，最后设计和实现了一个原型系统。相似文献

19.

A multi-application approach for synthesizing custom network-on-chips

Kashi Somayeh Patooghy Ahmad Rahmati Dara Fazeli Mahdi 《The Journal of supercomputing》2022,78(13):15358-15380

Execution of multiple applications on Multi-Processor System-on-Chips (MPSoCs) significantly boosts performance and energy efficiency. Although various researchers have suggested Network-on-Chip (NoC) architectures for MPSoCs, the problem still needs more investigations for the case of multi-application MPSoCs. In this paper, we propose a fully automated synthesis flow in five steps for the design of custom NoC fabrics for multi-application MPSoCs. The steps include: preprocessing, core to router allocation, voltage island merging, floorplanning, and router to router connection. The proposed flow finds design solutions that satisfy the performance, bandwidth, and power constraints of all input applications. If the user decides, the proposed synthesis adds network-level reconfiguration to improve the efficiency of the obtained design solutions. With the reconfiguration option, the proposed flow comes up with adaptive NoC architectures that satisfy each application’s communication requirements while power-gate idle resources, e.g., router ports and links. If reconfiguration option is not set by the user, the proposed flow considers the top communication requirements among the applications in finding design solutions. We have used the proposed synthesis flow to design custom NoCs for several combined graphs of real-world applications and synthetic graphs. Results show that the reconfiguration option can save up to 98% in the energy-delay product (EDP) of the ultimate designs.

相似文献

20.

Elastic Flow in an Application Specific Network-on-Chip

Daniel Gebhardt Kenneth S. Stevens 《Electronic Notes in Theoretical Computer Science》2008,200(1):3

A Network-on-Chip (NoC) is increasingly needed to interconnect the large number and variety of Intellectual Property (IP) cells that make up a System-on-Chip (SoC). The network must be able to communicate between cells in di erent clock domains, and do so with minimal space, power, and latency overhead. In this paper, we describe an asynchronous NoC using an elastic-flow protocol, and methods of automatically generating a topology and router placement. We use the communication profile of the SoC design to drive the binary-tree topology creation and the physical placement of routers, and a force-directed approach to determine router locations. The nature of elastic-flow removes the need for large router bu ers, and thus we gain a significant power and space advantage compared to traditional NoCs. Additionally, our network is deadlock-free, and paths have bounded worst-case communication latencies. 相似文献