首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity.  相似文献   

2.
The design of a new adaptive virtual cut-through router for torus networks is presented in this paper. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels. This has been achieved by combining a low-cost deadlock avoidance mechanism for virtual cut-through networks, called Bubble flow control, with an adequate design of the router's arbiter.  A thorough methodology has been employed to quantify the impact that this router design has at all levels, from its hardware cost to the system performance when running parallel applications. At the VLSI level, our proposal is the adaptive router with the shortest clock cycle and node delay when compared with other state-of-the-art alternatives. This translates into the lowest latency and highest throughput under standard synthetic loads. At system level, these gains reduce the execution time of the benchmarks considered. Compared with current adaptive wormhole routers, the execution time is reduced by up to 27%. Furthermore, this is the only router that improves system performance when compared with simpler static designs.  相似文献   

3.
Asynchronous quasi-delay-insensitive (QDI) NoCs have several advantages over their clocked counterparts. Virtual channel (VC) is the most utilized flow control method in asynchronous routers but spatial division multiplexing (SDM) achieves better throughput performance for best-effort traffic than VC. A novel asynchronous SDM router architecture is presented. Area and latency models are provided to analyse the network performance of all router architectures including wormhole, virtual channel and SDM. Performance comparisons have been made with different configurations of payload size, communication distance, buffer size, port bandwidth, network size and number of VCs/virtual circuits. Compared with VC, SDM achieves higher throughput with lower area overhead.  相似文献   

4.
具有拥塞缓解策略的动态虚拟通道研究及其VLSI实现   总被引:1,自引:0,他引:1  
虚拟通道技术改善了片上网络性能,却带来了巨大的面积与功耗开销.通过分析静态虚拟通道的不足,提出了基于拥塞缓解策略的动态虚拟通道结构.它采用链表方式组织缓冲,可以自动调整通道结构来适应各种流量负载:在较低流量下,该结构扩展通道队列深度,减小了报文传输延迟;在较高流量下,它增加虚拟通道数量,消除队列头阻塞与通道不足阻塞,并缓解拥塞现象发生,减少流反馈次数,提高了网络吞吐率.在90nm CMOS工艺下完成了DVC路由器的VLSI设计,与传统路由器相比,不仅改善了报文传输延迟与吞吐率,而且有效降低了面积与功耗开销.  相似文献   

5.
The evaluation of advanced routing features must be based on both of costs and benefits. To date, adaptive routers have generally been evaluated on the basis of the achieved network throughput (channel utilization), ignoring the effects of implementation complexity. In this paper, we describe a parameterized cost model for router performance, characterized by two numbers: router delay and flow control time. Grounding the cost model in a 0.8 micron gate array technology, we use it to compare a number of proposed routing algorithms. From these design studies, several insights into the implementation complexity of adaptive routers are clear. First, header update and selection is expensive in adaptive routers, suggesting that absolute addressing should be reconsidered. Second, virtual channels are expensive in terms of latency and cycle time, so decisions to include them to support adaptivity or even virtual lanes should not be taken lightly. Third, requirements of larger crossbars and more complex arbitration cause some increase in the complexity of adaptive routers, but the rate of increase is small. Last, the complexity of adaptive routers significantly increases their setup delay and flow control cycle times, implying that claims of performance advantages in channel utilization and low load latency must be carefully balanced against losses in achievable implementation speed  相似文献   

6.
Several recent studies have shown that adaptive routing algorithms based on deadlock recovery have superior performance characteristics than those based on deadlock avoidance. Most of these studies, however, have relied on software simulation due to the lack of analytical modelling tools. In an effort towards filling this gap, this paper presents a new analytical model of compressionless routing in wormhole-routed hypercubes. This routing algorithm exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. The advantages of compressionless routing include deadlock-free adaptive routing with no extra virtual channels, simple router design, and order-preserving message transmission. The proposed analytical model computes message latency by determining the message transmission time, blocking delay at each router, multiplexing delay at each network channel, and waiting time in the source before entering the network. The validity of the model is demonstrated by comparing analytical results with those obtained through simulation experiments.  相似文献   

7.
自适应路由可以有效地提高片上网络性能,却导致网络中的数据传输乱序.设计了一个两级流水虚拟通道虫孔交换路由器,通过修改数据包的标记位和路由计算单元,使路由器支持确定性和自适应路由算法,简化了数据传输乱序问题;同时,将流经路由器的数据流分为东西和南北2个部分;在此基础上从经典的部分自适应路由出发,增加虚拟通道允许原本禁止的转向,实现了无死锁的自适应路由,并降低了网络延迟与硬件开销.  相似文献   

8.
Most multicomputer interconnection networks use wormhole switching, leading to fast and compact routers. Current routers incorporate virtual channels and even fully adaptive routing. Networks of workstations (NOWs) inherited multicomputer technology. Most commercial routers designed for NOWs implement wormhole switching. However, wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. However, buffer requirements and packetizing overhead prevented its widespread use in multicomputers. Nevertheless, wormhole and VCT switching require similar buffer capacity in NOWs. Moreover, some messaging layers such as Illinois Fast Messages (FM) and BIP split messages into packets for increased performance. Therefore, the traditional disadvantages of VCT switching disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole routers, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in a NOW environment. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost. Also, VCT routers require buffer capacity independent of wire length, making them suitable for networks of workstations.  相似文献   

9.
一种动态分配虚拟输出队列结构的片上路由器   总被引:1,自引:0,他引:1  
传统虚通道流控技术的片上路由器通过增加虚通道缓解排头阻塞引起的链路吞吐率下降以及网络拥塞的同时,面临缓冲区低利用率、仲裁开销较大等问题.而动态虚通道流控的片上路由器虽可通过动态管理缓冲单元,提高缓冲区利用率与链路吞吐率,但却不可避免流控与仲裁逻辑复杂度与开销的快速增长.为了提高链路吞吐率与缓冲区利用率,获得较好的性能与开销折中,提出一种动态分配虚拟输出队列结构的片上路由器DAVOQ,该结构通过快速链表动态组织虚拟输出队列,同时使用超前路由机制以简化仲裁逻辑,优化流水线.模拟与综合的结果表明,相比传统虚通道路由器,DAVOQ路由器改善报文传输延迟与吞吐率的同时,在0.13μm CMOS工艺下,节省了15.1%的标准单元面积与12.9%的漏电流功耗;而相比动态虚通道路由器,DAVOQ路由器能够以较小的吞吐率损失获得可观的延迟改善,同时节约15.6%的标准单元面积与20.5%的漏电流功耗.  相似文献   

10.
Eun Jung  Ki Hwan  Chita R.   《Performance Evaluation》2005,60(1-4):275-302
The growing use of clusters in diverse applications, many of which have real-time constraints, requires quality-of-service (QoS) support from the underlying cluster interconnect. All prior studies on QoS-aware cluster routers/networks have used simulation for performance evaluation. In this paper, we present an analytical model for a wormhole-switched router with QoS provisioning. In particular, the model captures message blocking due to wormhole switching in a pipelined router, and bandwidth sharing due to a rate-based scheduling mechanism, called VirtualClock. Then we extend the model to a hypercube-style cluster network. Average message latency for different traffic classes and deadline missing probability for real-time applications are computed using the model.

We evaluate a 16-port router and hypercubes of different dimensions with a mixed workload of real-time and best-effort (BE) traffic. Comparison with the simulation results shows that the single router and the network models are quite accurate in providing the performance estimates, and thus can be used as efficient design tools.  相似文献   


11.
Decrease in the Integrated Circuit (IC) feature sizes leads to the increase in the susceptibility to transient and permanent errors. The growing rate of such errors in ICs intensifies the need for a wide range of solutions addressing reliability at various levels of abstractions. Network on Chip (NoC) architecture has been introduced to address the increasing demand for communication bandwidth among processing cores. The structural redundancy inherited in NoC-based system can be leveraged to improve reliability and compensate for the effects of failures. In this paper, we propose a fault-tolerant NoC router NISHA, which stands for No-deadlock Interconnection of Subnets in Hierarchical Architectures. Armed with a new flow control mechanism, as well as an enhanced Virtual Channel (VC) regulator, the proposed router can mitigate the effects of both transient and permanent errors. A Dynamic/Static virtual channel allocation with respect to the local and global traffic is supported in NISHA; thereby, it maintains a deadlock-free state in the presence of routers or link failures in hierarchical topologies. Experimental results show an enhanced operation of NoC applications as well as the decrease in the average latency and energy consumption.  相似文献   

12.
虫洞路由芯片的伸缩缓冲区设计与实现   总被引:1,自引:0,他引:1  
虫洞路由交换机制由于延迟时间短,对路由芯片缓冲区容量要求小等特点,被广泛应用在机群交换网络的路由芯片中。但阻塞时占用整个传送路径,会使网络吞吐率下降,此外,刹车问题的存在,会限制数据传送速率。该文提出了用伸缩缓冲区技术解决虫洞路由芯片刹车问题的方法,并已在DawningUX8路由芯片中实现,应用结果表明既可以很好地解决刹车问题,也可以提高网络性能。  相似文献   

13.
在torus网络中气泡流控是一种有效、实用的死锁避免技术.关键气泡机制使用虚跨步技术,只需要使用一个报文缓冲区就可以避免torus网络中的环内死锁,但是可能存在阻塞.首先提出了伪报文协议,然后结合伪报文协议设计了移动气泡流控策略,克服了关键气泡不能移动时引起的阻塞.伪报文协议基于简单的请求-应答,移动气泡流控则使用传统的信用传输方法.采用该机制,路由器只需要最少两条虚通道,每条虚通道最少一个报文空间就可以实现无死锁完全自适应路由.通过对经典路由器进行适当修改,给出了实现移动气泡流控的方法.采用模拟器比较了各种气泡流控的性能,结果表明,移动气泡流控性能超出传统的气泡机制,而加入自适应机制后的性能明显高于其他非自适应方法,不仅降低了延迟,吞吐率也提高20%以上,最大幅度甚至达100%.  相似文献   

14.
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities.  相似文献   

15.
Adaptivevirtual cut-throughis considered as a viable alternative towormhole switchingfor fast and hardware-efficient interprocessor communication in multicomputers. Computer simulations are used to show that our implementation of a minimal-path fully-adaptive virtual cut-through algorithm outperforms both deterministic and adaptive wormhole switching methods under both uniform random message distributions and clustered distributions such as the matrix transpose. A hardware-efficient implementation of adaptive virtual cut-through has been implemented using a semi-custom-designed router chip that requires only 2.3% more area than a comparable deterministic wormhole router chip. A network interface controller chip, which is crucial to our adaptive virtual cut-through method, has also been designed and is under fabrication.  相似文献   

16.
Networks-on-chips (NoCs) have been studied to connect a number of modules in a chip by introducing a network structure which is similar to that in parallel computers. Since embedded streaming applications usually generate predictable small-sized data traffic, the network structure can be customized to the target traffic. Accordingly, we develop a data transfer technique for simplifying routers for predictable small-sized communication in simple tile-based architectures. A data structure is split into single-flit packets, and a label is attached to each of them in order to route them independently. A label is transferred on dedicated wires beside data lines in a channel by taking advantage of relaxed pin count limitations of a channel. To reduce the wiring area for the label, the label is locally assigned according to a preanalysis of required communication pairs of nodes. Analysis results show that only a 3-bit local label is sufficient to route all data of evaluated streaming applications in the case of a 16-node 2D torus. The required amount of hardware for a router is reduced by 37 percent compared with that for a wormhole packet router with the same number of routing table entries  相似文献   

17.
周端  彭景  张剑贤  张晗 《计算机应用》2011,31(10):2621-2624
针对片上网络路由器功耗问题,在系统级层次上对影响路由器功耗的虚通道数目、缓存深度和数据微片位数等关键因素进行了研究。提出了综合多种功耗关键因素以及虚拟通道共享交叉开关输入端口的功耗降低方法,设计实现了一种低能耗的NoC路由器。实验结果表明,与Alpha 21364路由器和IBM InfiniBand路由器相比,所设计的路由器具有较低的功耗。  相似文献   

18.
The current fault-tolerant routing methods require extensive changes to practical routers such as the Cray T3D's dimension-order router to handle faults. In this paper, we propose methods to handle faults in multicomputers with dimension-order routers with simple changes to router structure and logic. Our techniques can be applied to current implementations in which the router is partitioned into multiple modules and no centralized crossbar is used. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. We apply our techniques for torus networks and show that, with as few as four virtual channels per physical channel, deadlock- and livelock-free routing can be provided even with multiple faults and multimodule implementation of routers. Our simulations of the proposed technique for 2D tori and mesh indicate that the performance degradation is similar to that seen in the case of cross-bar based designs previously proposed  相似文献   

19.
This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. This paper indicates that the advantages associated with these schemes cannot be realized with limited consumption capacity. To alleviate such performance bottlenecks, a new network interface design using multiple consumption channels is proposed. To match virtual multiplexing on network channels, we also propose each consumption channel to support multiple virtual consumption channels. The impact of message arrival rate at a node on the required number of consumption channels is studied analytically. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual lanes have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual lanes, messaging overheads, and communication traffic is studied through simulation to derive the effective number of consumption channels required in a system. Using the ongoing technological trend, it is shown that wormhole-routed systems can use up to two-four consumption channels per node to deliver better system performance  相似文献   

20.
Real-time applications when mapped to distributed memory multiprocessors produce periodic messages with an associated deadline and priority. Real-time messages may be hard or soft deadline. Real-time extensions to wormhole routing (WR) with multiple virtual channels (VCs) and priority-based physical link arbitration and VC allocation have been proposed in the literature. With a fixed number of VCs/link, a message can face an unbounded priority inversion, rendering the global priority ineffective. In this paper, we propose a new flow control mechanism called Preemptive Pipelined Circuit Switching for Real-Time messages (PPCS-RT) to reduce the priority inversion problem. For the proposed model, with some architectural support, we present an off-line approach to compute delivery guarantees of hard deadline real-time messages. We also perform a comparison of real-time WR and PPCS-RT in terms of performance with soft deadline traffic. The overall miss ratio percentage is over 30 percent higher for WR than PPCS-RT with one VC/link at high traffic loads. Finally, we compare the architectural complexity of a PPCS-RT router and other real-time routers  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号