首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
The power overhead of Networks-on-Chip (NoCs) becomes tremendous in high density Multiprocessor Systems-on-Chip (MPSoCs). Especially in hard real-time and safety-critical systems, power management mechanisms must be developed and efficiently adhered to real-time requirements. However, state-of-the-art solution typically induces a high timing overhead, thus challenging safety, or has limited power saving capabilities. Additionally, current power-gating mechanisms do not provide an upper bound of the latency overhead, and thus no timing guarantees. We propose a safe and enhanced approach for power-gating that allows a global and dynamic power management under timing guarantees, i.e., all deadlines of critical tasks are met. It introduces a control-layer to save power on the NoC data layer using multiple Power-Aware Traffic-Monitor (PATM) units, which apply knowledge of the global state of the system to efficiently save power on NoC routers even at high NoCs utilizations. To safely apply the PATMs in hard real-time systems while meeting the deadlines, we provide a formal worst-case timing analysis to derive PATMs upper bound latency overhead. Experimental results show that our approach efficiently reduces static power consumption, and provides scalability inducing very small area overhead.  相似文献   

2.
Network-on-Chip (NoC) has been recognized as the new paradigm to interconnect and organize a high number of cores. NoCs address global communication issues in System-on-Chips (SoC) involving communication-centric design and implementation of scalable communication structures evolving application-specific NoC design as a key challenge to modern SoC design. In this paper we present a SystemC customization framework and methodology for automatic design and evaluation of regular and irregular NoC architectures. The presented framework also supports application-specific optimization techniques such as priority assignment, node clustering and buffer sizing. Experimental results show that generated regular NoC architectures achieve an average of 5.5 % lower communication-cost compared to other regular NoC designs while irregular NoCs proved to achieve on average 4.5×higher throughput and 40 % network delay reduction compared to regular mesh topologies. In addition, employing a buffer sizing algorithm we achieve a reduction in network’s power consumption by an average of 45 % for both regular and irregular NoC design flow.  相似文献   

3.
We present a novel Partial Virtual channel Sharing (PVS) NoC architecture which reduces the impact of faults on performance and also tolerates faults within the routing logic. Without PVS, failure of a component impairs the fault-free connected components, which leads to considerable performance degradation. Improving resource utilization is key in enhancing or sustaining performance with minimal overhead when faults or overload occurs. In the proposed architecture, autonomic virtual-channel buffer sharing is implemented with a novel algorithm that determines the sharing of buffers among a set of ports. The runtime allocation of the buffers depends on incoming load and fault occurrence. In addition, we propose an efficient technique for maintaining the accessibility of a processing element (PE) to the network even if its router is faulty. Our techniques can be used in any NoC topology and for both, 2D and 3D NoCs. The synthesis results for an integrated video conference application demonstrate 22 % reduction in average packet latency compared to state-of-the-art virtual channel (VC) based NoC architecture. Extensive quantitative simulation has been carried out with synthetic benchmarks. Simulation results reveal that the PVS architecture improves the performance significantly in presence of faults, compared to other VC-based NoC architectures.  相似文献   

4.
With the growing complexity in consumer embedded products, new tendencies forecast heterogeneous Multi-Processor Systems-On-Chip (MPSoCs) consisting of complex integrated components communicating with each other at very high-speed rates. Intercommunication requirements of MPSoCs made of hundreds of cores will not be feasible using a single shared bus or a hierarchy of buses due to their poor scalability with system size, their shared bandwidth between all the attached cores and the energy efficiency requirements of final products.

To overcome these problems of scalability and complexity, Networks-On-Chip (NoCs) have been proposed as a promising replacement to eliminate many of the overheads of buses and MPSoCs connected by means of general-purpose communication architectures. However, the development of application-specific NoCs for MPSoCs is a complex engineering process that involves the definition of suitable protocols and topologies of switches, and which demands adequate design flows to minimize design time and effort. In fact, the development of suitable high-level design and synthesis tools for NoC-based interconnects is a key element to benefit from NoC-based interconnect design in nanometer-scale CMOS technologies.

In this article we overview the benefits of state-of-the-art NoCs using a complete NoC synthesis flow, and a detailed scalability analysis of different NoC implementations for the latest nanometer-scale technology nodes. We present NoC-based solutions for the on-chip interconnects of MPSoCs that illustrate the benefits of competitive application-specific NoCs with respect to more regular NoC topologies regarding performance, area and power. Moreover, we show that it is currently feasible to synthesize in an automatic way a complete custom NoC interconnect from a high-level specification in few hours. Finally, we summarize future research challenges in the area of NoC interconnect design automation.  相似文献   


5.
周小锋  刘露  朱樟明  周端 《半导体学报》2016,37(11):115003-7
The design of a router in a network-on-chip (NoC) system has an important impact on some performance criteria. In this paper, we propose a low overhead load balancing router (LOLBR) for 2D mesh NoC to enhance routing performance criteria with low hardware overhead. The proposed LOLBR employs a balance toggle identifier to control the initial routing direction of X or Y for flit injection. The simplified demultiplexers and multiplexers are used to handle output ports allocation and contention, which provide a guarantee of deadlock avoidance. Simulation results show that the proposed LOLBR yields an improvement of routing performance over the reported routing schemes in average packet latency by 26.5%. The layout area and power consumption of the network compared with the reported routing schemes are 15.3% and 11.6% less respectively.  相似文献   

6.
Network-on-chip (NoC) is one of critical communication architectures for the scaling of future many-core processors. The challenge for on-chip network is reducing design complexity to save both area and power while providing high performance such as low latency and high throughput. Especially, with increase of network size, both design complexity and power consumption have become the bottlenecks preventing proper network scaling. Moreover, as technology continuously scales down, leakage power takes up a larger fraction of total NoC power. It is increasingly important for a power-efficient NoC design to reduce the increasing leakage power. Power-gating, as a representative low-power technique, can be applied to an on-chip network for mitigating leakage power. In this paper, we propose a low-cost and low-power router architecture for the unidirectional torus network, and adopt an improved corner buffer structure for the inoffensive power-gating, which has minimal impact on network performance. Besides, an explicit starvation avoidance mechanism is introduced to guarantee injection fairness while decreasing its negative impact on network throughput. Simulation results with synthetic traffic show that our design can improve network throughput by 11.3% on average and achieve significant power-saving in low- and medium-load regions. In the SPLASH-2 workload simulation, our design can save on average 27.2% of total power compared to the baseline, and decrease 42.8% average latency compared to the baseline with power-gating.  相似文献   

7.
This paper presents a novel high performance Network-on-Chip (NoC) router architecture design using a bi-directional link with double data rate (BiLink). Ideally, it can provide as high as 2 times speed-up compared with the conventional NoC router. BiLink utilizes an extra link stage between routers and transmits two flits in one link per cycle using phase pipelining if both routers require to use the current link. To further increase the effective bandwidth, the direction of each link can be configured in every clock cycle to cater for different traffic loads from each side. Therefore, the data rate can be as high as 4 times compared with conventional NoC routers under uneven traffic. Centralized mode control scheme is implemented using a finite state machine (FSM) approach. Cycle-accurate simulations are carried out on both synthetic traffic patterns as well as real application benchmarks. Simulation results show that BiLink can provide as high as 90% and 250% speedup compared with conventional NoC routers for even and uneven traffic, respectively. 2X and 3X gains in throughput are obtained under even and uneven traffic, respectively, when compared with the conventional NoC router for the virtual channel flow control. The BiLink router architecture is synthesized using TSMC 65 nm process technology and it is shown that an area overhead of 28% over state-of-the-art bi-directional NoC is introduced while the critical path is about 9% higher than that of the conventional routers. Despite the overhead in critical path and power consumption, a 47.45% improvement of Energy-Delay-Product (EDP) is achieved by BiLink under high injection rate traffic.  相似文献   

8.
Multicast on-chip communication is encountered in various cache-coherence protocols targeting multi-core processors, and its pervasiveness is increasing due to the proliferation of machine learning accelerators. In-network handling of multicast traffic imposes additional switching-level restrictions to guarantee deadlock freedom, while it stresses the allocation efficiency of Network-on-Chip (NoC) routers. In this work, we propose a novel partitioned NoC router microarchitecture, called SmartFork, which employs a versatile and cost-efficient multicast packet replication scheme that allows the design of high-throughput and low-cost NoCs. The design is adapted to the average branch splitting observed in real-world multicast routing algorithms. Compared to state-of-the-art NoC multicast approaches, SmartFork is demonstrated to yield high performance in terms of latency and throughput, while still offering a cost-effective implementation.  相似文献   

9.
对芯片面积、能耗上的严格限制是片上网络与宏观网络的最大不同。片上网络路由器中的缓存占用了大量的芯片面积和功耗,因此无缓存片上网络得到广泛关注。它完全去掉路由器内的缓存,通过偏转未获得有效端口的微片,处理微片对输出端口的竞争。本文对无缓存偏转网络的原理及关键技术进行了研究,包括拓扑结构、仲裁策略等。最后,通过与有缓存网络的对比,对无缓存网络的优势与劣势进行了总结。  相似文献   

10.
Network on chip (NoC) has been proposed as an appropriate solution for today’s on-chip communication challenges. Power dissipation has become a key factor in the NoCs because of their shrinking sizes. In this paper, we propose a new encoding approach aimed at power reduction by decreasing the number of switching activities on the buses. This approach assigns the symbols to data word in such a way that the more frequent words are sent by less power consumption. This algorithm dedicates the symbols with less ones to high probability data and uses transition signaling to transmit data. The proposed method, unlike the existing low power encoding, does not rely on spatial redundancy and keeps the width of the bus constant. Experimental evaluations show that our approach reduces the power dissipation up to 46 % with 2.70, 0.51, and 15.43 % power, critical path and area overhead in the NoCs, respectively.  相似文献   

11.
随着半导体集成工艺的发展,单个芯片上集成的IP核数目急剧增加,片上网络(NoC)成为未来取代总线设计的新模式。调度算法作为NoC研究的关键问题之一,对整个网络的传输性能起着重要的作用。本文对片上网络虚信道路由器调度算法的相关研究进展进行了总结,首先从路由器结构特点出发,介绍了几种典型的NoC仲裁器和调度器实例,总结相应的算法设计思想。再对NoC常用路由器调度算法进行了分类介绍,详细分析了各种调度算法的相关特性。最后,探讨了NoC路由器调度算法的研究方向。  相似文献   

12.
Network-on-Chip (NoC) -based communication architecture is promising in addressing the communication bottlenecks in current and future multicore processors. In this work, we consider the application-specific mapping problem and electromigration (EM) -induced through-silicon via (TSV) reliability issue in tile-based three-dimension (3D) NoC architectures. In 3D NoCs, network contention may result in unacceptable communication delay among the processing cores and thus has significant effect on the system performance. So we propose a new latency model for the routers which characterizes the network contentions among different traffic flows from sharing of network resources. Then we solve the core mapping problem by a fast while efficient stochastic algorithm called Simulated Allocation (SAL), which integrates our new latency model and also aims to optimize the communication power, latency in the mapping procedure. After that, we use an incremental method to optimize the reliability of the TSVs. Experimental results show that, contention-aware model (CAM) has 25% larger network latency than our latency model; compared with particle swarm optimization (PSO), our SAL algorithm can achieve 7% less power with about 7.5 × run-time speedup; as for reliability, our method can achieve better results (up to 10 × increase in terms of the void nucleation time (VNT)) with 7.64% increased latency.  相似文献   

13.
为了降低片上网络(NoC)由于虫孔缓冲结构排头(HoL)阻塞导致的性能损失,同时消除虚通道缓冲结构对可变长度报文表现出的缓冲区低利用率现象,本文采用虚拟通道技术提出一种动态分配输入队列(DAIQ)的片上虫孔路由器结构.该结构采用一种令牌表的方式支持虚拟队列深度与数量的动态分配,同时为了支持同一报文微片能够连续调度,本文还提出一种新颖的开关分配机制——SRRM,该机制在高负载下进一步改善了开关的延迟与吞吐率.仿真与综合的结果表明,相比传统虚通道流控的片上路由器结构,DAIQ路由器以50%的缓冲面积获得类似的性能,在0.13微米CMOS工艺下节约了30.18%的标准单元面积与384%的功耗.  相似文献   

14.
基于提前分配路径的低时延片上路由器结构   总被引:1,自引:0,他引:1  
该文针对片上网络提出一种基于提前分配路径的低时延片上路由器结构(PAPR)。新路由器采用提前路由计算和提前分配路径来缩短路由器流水线深度。提前路由计算为虚信道提前分配提供了可靠保障,即使在虚信道路径提前分配失败的情况下,也不影响分组在网络中的传输时延。该文提出基于缓存状态的仲裁算法BSTS(Buffer Status)综合考虑当前节点缓存信息和下游节点缓存信息,不但降低了分组等待时延,而且降低了缓存空闲的概率。仿真结果表明,新路由器能明显改善网络的时延和吞吐性能,相比采用滑动迭代轮询仲裁iSLIP(iterative Round-Robin Matching with SLIP(Serial Line Interface Protocal))算法的经典虚信道路由器,网络平均端到端时延降低了24.5%,吞吐率提高了27.5%;与采用轮询迭代 RRM(Round-Robin Matching) 算法的经典虚信道路由器相比,平均端到端时延降低了39.2%,吞吐率提高了47.2%。路由器硬件开销和平均功耗分别增加仅为8.9%,5.9%。  相似文献   

15.
周小锋  朱樟明  周端 《半导体学报》2016,37(7):075002-8
The bufferless router emerges as an interesting option for cost-efficient in network-on-chip (NoC) design. However, the bufferless router only works well under low network load because deflection more easily occurs as the injection rate increases. In this paper, we propose a load balancing bufferless deflection router (LBBDR) for NoC that relieves the effect of deflection in bufferless NoC. The proposed LBBDR employs a balance toggle identifier in the source router to control the initial routing direction of X or Y for a flit in the network. Based on this mechanism, the flit is routed according to XY or YX routing in the network afterward. When two or more flits contend the same one desired output port a priority policy called nearer-first is used to address output ports allocation contention. Simulation results show that the proposed LBBDR yields an improvement of routing performance over the reported bufferless routing in the flit deflection rate, average packet latency and throughput by up to 13%, 10% and 6% respectively. The layout area and power consumption compared with the reported schemes are 12% and 7% less respectively.  相似文献   

16.
In the deep sub-micron regime, the performance of network-on-chip (NoC) architectures is bound by the limited power and area budget. Proposed is a low-power low-area NoC architecture using a novel power-efficient control circuit that enables repeaters along the inter-router links to function as adaptive link buffers, thereby reducing the number of buffers required in the router. Simulation results in the 90 nm technology show power savings of nearly 45% and area savings of 50% for the proposed technique.  相似文献   

17.
An asynchronous router for quality-of-service Networks on Chip (QNoC) is presented. It combines multiple service levels (SL) with multiple equal-priority virtual channels (VC) within each SL. VCs are assigned dynamically per packet in each router. The router employs fast arbitration schemes to minimize latency. Analytical expressions for a generic NoC router performance, area and power are derived, showing linear dependence on the number of buffers and flit width. The analytical results agree with QNoC router simulation results. The QNoC router architecture and specific asynchronous circuits are presented. When simulated on a 0.18 μm process, the router throughput ranges from 1.8 to 20 Gbps for flits 8-128 bits wide.  相似文献   

18.
The emergence of three-dimensional (3D) network-on-chip (NoC) has revolutionized the design of high-performance and energy efficient manycore chips. However, in general, 3D NoC architectures still suffer from high power density and the resultant thermal hotspots leading to functionality and reliability concerns over time. The power consumption and thermal profiles of 3D NoCs can be improved by incorporating a Voltage Frequency Island (VFI)-based power management strategy and Reciprocal Design Symmetry (RDS)-based floor planning. In this paper, we undertake a detailed design space exploration for 3D NoC by considering power-thermal-performance (PTP) trade-offs. We specifically consider a small-world network-enabled 3D NoC (3D SWNoC) in this performance evaluation due to its superior performance and energy-efficiency compared to any other existing 3D NoC architectures. We demonstrate that the VFI-enabled 3D SWNoC lowers the energy-delay-product (EDP) by 57.3% on an average compared to a 2D MESH without VFI. Moreover, by incorporating VFI, we reduce the maximum temperature of 3D SWNoC by 15.2% on an average compared to the non-VFI counterpart. By complementing the VFI-based power management with RDS-based floor planning, the 3D SWNoC reduces the maximum temperature by 25.1% on an average compared to the non-VFI counterpart.  相似文献   

19.

The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.

  相似文献   

20.
The network-on-chip (NoC) design problem requires the generation of a power and resource efficient interconnection architecture that can support the communication requirements for the SoC with the desired performance. This paper presents a genetic algorithm-based automated design technique that synthesizes an application specific NoC topology and routes the communication traces on the interconnection network. The technique operates on the system-level floorplan of the system on chip (SoC) and accounts for the power consumption in the physical links and the routers. The design technique solves a multi-objective problem of minimizing the power consumption and the router resources. It generates a Pareto curve of the solution set, such that each point in the curve represents a tradeoff between power consumption and associated number of NoC routers. The performance and quality of solutions produced by the technique are evaluated by experimentation with benchmark applications and comparisons with existing approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号