共查询到18条相似文献,搜索用时 170 毫秒
1.
片上硅面积和功耗受到严重限制,报文缓冲区容量也受到严重限制,如何高效使用报文缓冲区是NoC设计的关键问题之一.动态划分虚通道缓冲区是高效使用报文缓冲区的有效方法之一,但会增加拥塞程度,甚至出现无限拥塞的情况.提出一种基于二步流控方法的片上动态虚通道(DAVC)路由器,该二步流控方法将报文分成报文头和报文体两部分分别运用流控算法.实验结果表明:与静态虚通道(SAVC)片上路由器相比,在缓存容量相等的情况下,DAVC路由器能提高23.2%的吞吐率,传输延迟降低27.2%;在DAVC缓存容量减半的情况下可获得相近的性能,节省28.3%的面积与23.8%的漏电流功耗. 相似文献
2.
具有拥塞缓解策略的动态虚拟通道研究及其VLSI实现 总被引:1,自引:0,他引:1
虚拟通道技术改善了片上网络性能,却带来了巨大的面积与功耗开销.通过分析静态虚拟通道的不足,提出了基于拥塞缓解策略的动态虚拟通道结构.它采用链表方式组织缓冲,可以自动调整通道结构来适应各种流量负载:在较低流量下,该结构扩展通道队列深度,减小了报文传输延迟;在较高流量下,它增加虚拟通道数量,消除队列头阻塞与通道不足阻塞,并缓解拥塞现象发生,减少流反馈次数,提高了网络吞吐率.在90nm CMOS工艺下完成了DVC路由器的VLSI设计,与传统路由器相比,不仅改善了报文传输延迟与吞吐率,而且有效降低了面积与功耗开销. 相似文献
3.
在torus网络中气泡流控是一种有效、实用的死锁避免技术.关键气泡机制使用虚跨步技术,只需要使用一个报文缓冲区就可以避免torus网络中的环内死锁,但是可能存在阻塞.首先提出了伪报文协议,然后结合伪报文协议设计了移动气泡流控策略,克服了关键气泡不能移动时引起的阻塞.伪报文协议基于简单的请求-应答,移动气泡流控则使用传统的信用传输方法.采用该机制,路由器只需要最少两条虚通道,每条虚通道最少一个报文空间就可以实现无死锁完全自适应路由.通过对经典路由器进行适当修改,给出了实现移动气泡流控的方法.采用模拟器比较了各种气泡流控的性能,结果表明,移动气泡流控性能超出传统的气泡机制,而加入自适应机制后的性能明显高于其他非自适应方法,不仅降低了延迟,吞吐率也提高20%以上,最大幅度甚至达100%. 相似文献
4.
随着CMOS工艺进入纳米时代,工艺尺寸的不断缩小增加了集成电路对瞬态故障与永久故障的敏感性.在片上网络中提供容错支持对于提高单芯片多处理器片上数据传输的可靠性至关重要.为了处理片上网络中的瞬态故障与永久故障链路,提出一种可配置双向链路的容错偏转路由器BiFTDR.相邻BiFTDR路由器之间采用一对可配置方向的双向链路互连,根据链路的故障状态和路由器的到达包信息对双向链路的方向进行动态配置,在单向链路故障的情况下不需要绕道路由即可实现容错,并且不需要路由表从而降低了路由器的硬件实现开销.模拟结果表明,在合成通信模式下,网络中包含5条和15条永久故障链路的情况下,BiFTDR路由器的包平均延迟比一种基于强化学习的容错偏转路由器分别少10%和19%;在真实应用运行踪迹通信模式下,与无故障网络的包平均延迟相比,BiFTDR路由器的性能损失不到1%.对于瞬态故障,即使在高故障率下BiFTDR路由器的性能下降程度也较小.在65 nm工艺下对BiFTDR路由器进行综合,能达到500 MHz的时钟频率,并且具有较小的面积和功耗开销. 相似文献
5.
6.
在基于微片(flit)分组的动态缓存分配基础上,提出一种基于微片分组的片上网络交叉开关调度机制.该机制与静态独立分割缓存的思想不同,首先对输入端缓存进行统一管理,对微片根据其流向进行分组,并为所分各“组”动态分配缓存,然后引入一种基于“组”规模的概率仲裁算法,通过“组”分配和开关分配实现调度过程.为进一步降低开销,还在该机制基础上提出一种各“组”共享仲裁的策略.理论分析与实验结果均表明:所提出的机制相对于传统和动态虚通道机制,可节约25%以上的硬件开销并可获得更优的网络延迟与吞吐性能;共享仲裁策略可在所提机制基础上进一步降低硬件开销,但其代价是网络性能有所下降. 相似文献
7.
《软件工程师》2017,(1)
一个好的路由算法应同时满足:最小的路由跳数以减小传输延时,保持通讯的局域性;最大的平均情况和最坏情况吞吐率;简单的路由器结构。随机Oblivious路由算法在低功耗并行计算机互联网络以及片上网络中得到广泛应用。针对Torus网络下已提出的Oblivious路由算法所需虚通道数目多的缺点,提出了随机Oblivious路由算法WRD,该算法仅使用两条虚拟通道即可实现算法的无死锁性。通过仿真对所提算法的性能进行了验证,结果表明,该算法与使用两条虚拟通道的O1TURN路由算法相比,WRD路由算法在所有通讯模式下的网络吞吐率均有所提升。与使用四条虚拟通道的RLB算法相比,新提出的WRD路由算法性能接近于RLB算法,甚至在多个通讯模式下的网络吞吐率要好于RLB算法,而且WRD路由算法仅使用两条虚拟通道,降低了网络系统成本和功耗。 相似文献
8.
片上互连网络为多核体系结构提供了高效的通信支持。目前的片上网络通常采用单向传输链路,链路资源利用率较低。为了实现链路带宽资源高效分配、进而高效利用链路带宽资源,提出了一种新的双向链路调度算法,并设计了一种支持此算法的双向链路路由器。这种新型的路由器结构能够在不影响路由原有数据通道条件下,提供一条旁路数据通道来快速传输数据。实验结果表明,应用该双向链路路由器可使Mesh网络饱和吞吐率和链路平均利用率分别得到最大83.3%和24.53%的提升。 相似文献
9.
在高性能互连网络设计中,缩短通信延迟一直是设计的首要目标之一。虚跨步交换技术是一种降低延迟的有效手段,但是在有限的输入缓冲区条件下,在链路层上实现高效可靠传输具有一定的挑战性。本文提出了一种可靠的低延迟链路层设计方法,可实现对虚跨步的有效支持,减少了报文在中间路由器上的延迟。该方法结合了报文格式设计、发送方管理和接收方管理。通过在报文头中加入额外的校验码,有效地保护了报文头中的信息,提高了链路的容错能力;通过链路级重传,减少了端到端重传引起的时间、协议开销;通过对接收处理逻辑,尤其是接收缓冲区管理的有效实现,避免了可能出现的缓冲区溢出以及流控失效问题。 相似文献
10.
11.
Sivaram R. Stunkel C.B. Panda D.K. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):275-289
Switch-based interconnects are used in a number of application domains, including parallel system interconnects, local area networks, and wide area networks. However, very few switches have been designed that are suitable for more than one of these application domains. Such a switch must offer both extremely low latency and very high throughput for a variety of different message sizes. While some architectures with output queuing have been shown to perform extremely well in terms of throughput, their performance can suffer when used in systems where a significant portion of the packets are extremely small. On the other hand, architectures with input queuing offer limited throughput or require fairly complex and centralized arbitration that increases latency. In this paper, we present a new input queue-based switch architecture called HIPIQS (HIgh-Performance Input-Queued Switch). It offers low latency for a range of message sizes and provides throughput comparable to that of output queuing approaches. Furthermore, it allows simple and distributed arbitration. HIPIQS uses a dynamically allocated multiqueue organization, pipelined access to multibank input buffers, and small cross-point buffers to deliver high performance. Our simulation results show that HIPIQS can deliver performance close to that of output queuing approaches over a range of message sizes, system sizes, and traffic. The switch architecture can therefore be used to build high performance switches that are useful for both parallel system interconnects and for building computer networks 相似文献
12.
Mingche Lai Author VitaeLei GaoAuthor Vitae Sheng MaAuthor VitaeXiao NongAuthor Vitae Zhiying WangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):98-109
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities. 相似文献
13.
14.
Kenji Yoshigoe 《Computer Communications》2009,32(4):740-749
A combined input and crosspoint queued (CICQ) switch is receiving significant attention to be the next generation high speed packet switch for its scalability; however, a multi-cabinet implementation of a combined input and crosspoint queued (CICQ) switch unavoidably introduces a large round-trip time (RTT) latency between the line cards and switch fabric, resulting a large crosspoint (CP) buffer requirement. In this paper, virtual crosspoint queues (VCQs) that significantly reduces the CP buffer requirement of the CICQ switch is investigated. The VCQs unit resides inside the switch fabric, is dynamically shared among virtual output queues (VOQ) from the same source port, and is operated at the line rate, making the implementation practical. A threshold-based exhaustive round-robin (T-ERR) arbitration is employed to reduce buffer hogging at VCQ. The T-ERR at VCQ and CP arbiters serves packets residing in a longer queue more frequently than packet residing in a shorter queue. Consequently, the T-ERR, drastically increases the throughput of the CICQ switch with small CP buffers. A multi-cabinet implementation of CICQ switch do not support multicasting traffic well since a combination of small CP buffer in the switch fabric and a large RTT latency between the line cards and switch fabric results in non-work conservation of the intra-switch link. Deployment of multicast FIFO buffer between the input buffer and CP buffer shows a promise. With its ability to achieve high throughput independent of RTT and switch port size, the integration of the VCQ architecture and T-ERR scheduler to the CICQ switch is ideal for supporting ever-increasing Internet traffic that requires higher data rate, larger switch size, and efficient multicasting. 相似文献
15.
16.
17.
High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times. 相似文献
18.
Yixuan ZhangAuthor VitaeRandy Morris Jr.Author Vitae Avinash K. KodiAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):110-118
The input buffers of the current packet-switched Network-on-Chip (NoC) architectures consume a significant portion of the total power of the interconnection network. Reducing the size of input buffers would result in degraded performance, while eliminating all buffers would result in increased power at high network load. In this article, we propose DXbar: an innovative dual-crossbar design. By combining the advantages of buffered and bufferless networks, we achieve at least 20% performance improvement in terms of throughput and latency, and at least 20% power saving over buffered networks with virtual channels. Furthermore, DXbar can outperform current bufferless networks with deflecting and dropping protocols while consuming at most half of the power. 相似文献