首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 345 毫秒
1.
片上硅面积和功耗受到严重限制,报文缓冲区容量也受到严重限制,如何高效使用报文缓冲区是NoC设计的关键问题之一.动态划分虚通道缓冲区是高效使用报文缓冲区的有效方法之一,但会增加拥塞程度,甚至出现无限拥塞的情况.提出一种基于二步流控方法的片上动态虚通道(DAVC)路由器,该二步流控方法将报文分成报文头和报文体两部分分别运用流控算法.实验结果表明:与静态虚通道(SAVC)片上路由器相比,在缓存容量相等的情况下,DAVC路由器能提高23.2%的吞吐率,传输延迟降低27.2%;在DAVC缓存容量减半的情况下可获得相近的性能,节省28.3%的面积与23.8%的漏电流功耗.  相似文献   

2.
具有拥塞缓解策略的动态虚拟通道研究及其VLSI实现   总被引:1,自引:0,他引:1  
虚拟通道技术改善了片上网络性能,却带来了巨大的面积与功耗开销.通过分析静态虚拟通道的不足,提出了基于拥塞缓解策略的动态虚拟通道结构.它采用链表方式组织缓冲,可以自动调整通道结构来适应各种流量负载:在较低流量下,该结构扩展通道队列深度,减小了报文传输延迟;在较高流量下,它增加虚拟通道数量,消除队列头阻塞与通道不足阻塞,并缓解拥塞现象发生,减少流反馈次数,提高了网络吞吐率.在90nm CMOS工艺下完成了DVC路由器的VLSI设计,与传统路由器相比,不仅改善了报文传输延迟与吞吐率,而且有效降低了面积与功耗开销.  相似文献   

3.
在torus网络中气泡流控是一种有效、实用的死锁避免技术.关键气泡机制使用虚跨步技术,只需要使用一个报文缓冲区就可以避免torus网络中的环内死锁,但是可能存在阻塞.首先提出了伪报文协议,然后结合伪报文协议设计了移动气泡流控策略,克服了关键气泡不能移动时引起的阻塞.伪报文协议基于简单的请求-应答,移动气泡流控则使用传统的信用传输方法.采用该机制,路由器只需要最少两条虚通道,每条虚通道最少一个报文空间就可以实现无死锁完全自适应路由.通过对经典路由器进行适当修改,给出了实现移动气泡流控的方法.采用模拟器比较了各种气泡流控的性能,结果表明,移动气泡流控性能超出传统的气泡机制,而加入自适应机制后的性能明显高于其他非自适应方法,不仅降低了延迟,吞吐率也提高20%以上,最大幅度甚至达100%.  相似文献   

4.
周端  彭景  张剑贤  张晗 《计算机应用》2011,31(10):2621-2624
针对片上网络路由器功耗问题,在系统级层次上对影响路由器功耗的虚通道数目、缓存深度和数据微片位数等关键因素进行了研究。提出了综合多种功耗关键因素以及虚拟通道共享交叉开关输入端口的功耗降低方法,设计实现了一种低能耗的NoC路由器。实验结果表明,与Alpha 21364路由器和IBM InfiniBand路由器相比,所设计的路由器具有较低的功耗。  相似文献   

5.
随着CMOS工艺进入纳米时代,工艺尺寸的不断缩小增加了集成电路对瞬态故障与永久故障的敏感性.在片上网络中提供容错支持对于提高单芯片多处理器片上数据传输的可靠性至关重要.为了处理片上网络中的瞬态故障与永久故障链路,提出一种可配置双向链路的容错偏转路由器BiFTDR.相邻BiFTDR路由器之间采用一对可配置方向的双向链路互连,根据链路的故障状态和路由器的到达包信息对双向链路的方向进行动态配置,在单向链路故障的情况下不需要绕道路由即可实现容错,并且不需要路由表从而降低了路由器的硬件实现开销.模拟结果表明,在合成通信模式下,网络中包含5条和15条永久故障链路的情况下,BiFTDR路由器的包平均延迟比一种基于强化学习的容错偏转路由器分别少10%和19%;在真实应用运行踪迹通信模式下,与无故障网络的包平均延迟相比,BiFTDR路由器的性能损失不到1%.对于瞬态故障,即使在高故障率下BiFTDR路由器的性能下降程度也较小.在65nm工艺下对BiFTDR路由器进行综合,能达到500MHz的时钟频率,并且具有较小的面积和功耗开销.  相似文献   

6.
在基于微片(flit)分组的动态缓存分配基础上,提出一种基于微片分组的片上网络交叉开关调度机制.该机制与静态独立分割缓存的思想不同,首先对输入端缓存进行统一管理,对微片根据其流向进行分组,并为所分各“组”动态分配缓存,然后引入一种基于“组”规模的概率仲裁算法,通过“组”分配和开关分配实现调度过程.为进一步降低开销,还在该机制基础上提出一种各“组”共享仲裁的策略.理论分析与实验结果均表明:所提出的机制相对于传统和动态虚通道机制,可节约25%以上的硬件开销并可获得更优的网络延迟与吞吐性能;共享仲裁策略可在所提机制基础上进一步降低硬件开销,但其代价是网络性能有所下降.  相似文献   

7.
一个好的路由算法应同时满足:最小的路由跳数以减小传输延时,保持通讯的局域性;最大的平均情况和最坏情况吞吐率;简单的路由器结构。随机Oblivious路由算法在低功耗并行计算机互联网络以及片上网络中得到广泛应用。针对Torus网络下已提出的Oblivious路由算法所需虚通道数目多的缺点,提出了随机Oblivious路由算法WRD,该算法仅使用两条虚拟通道即可实现算法的无死锁性。通过仿真对所提算法的性能进行了验证,结果表明,该算法与使用两条虚拟通道的O1TURN路由算法相比,WRD路由算法在所有通讯模式下的网络吞吐率均有所提升。与使用四条虚拟通道的RLB算法相比,新提出的WRD路由算法性能接近于RLB算法,甚至在多个通讯模式下的网络吞吐率要好于RLB算法,而且WRD路由算法仅使用两条虚拟通道,降低了网络系统成本和功耗。  相似文献   

8.
片上互连网络为多核体系结构提供了高效的通信支持。目前的片上网络通常采用单向传输链路,链路资源利用率较低。为了实现链路带宽资源高效分配、进而高效利用链路带宽资源,提出了一种新的双向链路调度算法,并设计了一种支持此算法的双向链路路由器。这种新型的路由器结构能够在不影响路由原有数据通道条件下,提供一条旁路数据通道来快速传输数据。实验结果表明,应用该双向链路路由器可使Mesh网络饱和吞吐率和链路平均利用率分别得到最大83.3%和24.53%的提升。  相似文献   

9.
在高性能互连网络设计中,缩短通信延迟一直是设计的首要目标之一。虚跨步交换技术是一种降低延迟的有效手段,但是在有限的输入缓冲区条件下,在链路层上实现高效可靠传输具有一定的挑战性。本文提出了一种可靠的低延迟链路层设计方法,可实现对虚跨步的有效支持,减少了报文在中间路由器上的延迟。该方法结合了报文格式设计、发送方管理和接收方管理。通过在报文头中加入额外的校验码,有效地保护了报文头中的信息,提高了链路的容错能力;通过链路级重传,减少了端到端重传引起的时间、协议开销;通过对接收处理逻辑,尤其是接收缓冲区管理的有效实现,避免了可能出现的缓冲区溢出以及流控失效问题。  相似文献   

10.
何军  朱英 《计算机工程》2012,38(16):253-254
针对国产多核处理器的64位整数乘法器面积和功耗开销大的问题,提出一种新的Booth编码方式,对其Booth编码方式进行优化,通过多种方法验证设计优化的正确性,采用标准单元库进行逻辑综合评估。结果表明,工作频率可达1.0 GHz以上,面积减少9.64%,动态功耗和漏电功耗分别减少6.34%和11.98%,能有效减少乘法器的面积和功耗,达到预期目标。  相似文献   

11.
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities.  相似文献   

12.
Switch-based interconnects are used in a number of application domains, including parallel system interconnects, local area networks, and wide area networks. However, very few switches have been designed that are suitable for more than one of these application domains. Such a switch must offer both extremely low latency and very high throughput for a variety of different message sizes. While some architectures with output queuing have been shown to perform extremely well in terms of throughput, their performance can suffer when used in systems where a significant portion of the packets are extremely small. On the other hand, architectures with input queuing offer limited throughput or require fairly complex and centralized arbitration that increases latency. In this paper, we present a new input queue-based switch architecture called HIPIQS (HIgh-Performance Input-Queued Switch). It offers low latency for a range of message sizes and provides throughput comparable to that of output queuing approaches. Furthermore, it allows simple and distributed arbitration. HIPIQS uses a dynamically allocated multiqueue organization, pipelined access to multibank input buffers, and small cross-point buffers to deliver high performance. Our simulation results show that HIPIQS can deliver performance close to that of output queuing approaches over a range of message sizes, system sizes, and traffic. The switch architecture can therefore be used to build high performance switches that are useful for both parallel system interconnects and for building computer networks  相似文献   

13.
传统用于总线系统或互联网的仲裁方法已不能很好地适应NoC应用环境。围绕NoC系统性能的关键影响因素——拥塞状态,提出了一种基于全局和本地拥塞预测的仲裁策略(GLCA),以改善NoC网络延迟。实验结果表明,相对于RR方法,新仲裁算法使得网络平均包延迟和平均吞吐量最大分别可改善20.5%和8%,并且在不同负载条件下都保持了其优势。综合结果显示, GLCA与RR方法相比,路由器仅在组合逻辑上有少许增加(25.7%)。  相似文献   

14.
A combined input and crosspoint queued (CICQ) switch is receiving significant attention to be the next generation high speed packet switch for its scalability; however, a multi-cabinet implementation of a combined input and crosspoint queued (CICQ) switch unavoidably introduces a large round-trip time (RTT) latency between the line cards and switch fabric, resulting a large crosspoint (CP) buffer requirement. In this paper, virtual crosspoint queues (VCQs) that significantly reduces the CP buffer requirement of the CICQ switch is investigated. The VCQs unit resides inside the switch fabric, is dynamically shared among virtual output queues (VOQ) from the same source port, and is operated at the line rate, making the implementation practical. A threshold-based exhaustive round-robin (T-ERR) arbitration is employed to reduce buffer hogging at VCQ. The T-ERR at VCQ and CP arbiters serves packets residing in a longer queue more frequently than packet residing in a shorter queue. Consequently, the T-ERR, drastically increases the throughput of the CICQ switch with small CP buffers. A multi-cabinet implementation of CICQ switch do not support multicasting traffic well since a combination of small CP buffer in the switch fabric and a large RTT latency between the line cards and switch fabric results in non-work conservation of the intra-switch link. Deployment of multicast FIFO buffer between the input buffer and CP buffer shows a promise. With its ability to achieve high throughput independent of RTT and switch port size, the integration of the VCQ architecture and T-ERR scheduler to the CICQ switch is ideal for supporting ever-increasing Internet traffic that requires higher data rate, larger switch size, and efficient multicasting.  相似文献   

15.
片上网络中低延时可扩展的路由器结构设计   总被引:1,自引:0,他引:1  
为了满足片上网络中路由器能同时支持多个IP核的要求,并同时具有较好的延时性能,设计了一种分布式路由和仲裁的路由器结构。其中的仲裁模块根据当前路由器各输入端口的请求状态和下一路由器相应输入端口缓冲器的状态进行仲裁,此仲裁方法提高了数据包传输的成功率,从而降低了传输延时,使路由器具有良好的延时性能,同时仿真结果表明:该路由器在面积开销方面具有良好的可扩展性。  相似文献   

16.
为了有效利用缓存资源,提出一种动态分配片上网络路由器端口缓存的方法,根据传输方向将输入端口接收到的数据分成不同的组,每个组对应一个输出端口,并将数据以组的形式进行存储,控制部件根据各个组数据规模为其动态分配缓存资源。与基于虚通道的动态缓存分配方式相比,该方法降低了控制和仲裁的复杂度。仿真结果表明,获得同等性能的条件下,该方法可以有效降低缓存的需求。  相似文献   

17.
High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times.  相似文献   

18.
In many-core architectures different distributed applications are executed in parallel. The applications may need hard guarantees for communication with respect to latency and throughput to cope with their constraints. Networks on Chip (NoC) are the most promising approach to handle these requirements in architectures with a large number of cores. Dynamic reservation of communication resources in virtual channel NoCs is used to enable quality of service for concurrent communication. This paper presents a router design supporting best effort and connection-oriented guaranteed service communication. The communication resources are shared dynamically between the two communication schemes. The key contribution is a concept for virtual channel reservation supporting different bandwidth and latency guarantees for simultaneous guaranteed service communication flows. Different to state-of-the-art, the used scheduling approach allows to give hard guarantees regarding throughput and latency. The concept enables to adjust the bandwidth and latency requirements of connections at run-time to cope with dynamically changing application requirements. Due to its distributed reservation process and resource allocation it offers good scalability for many-core architectures. The implementation of a router and the required extension of a network interface to support the proposed concept are presented. The software perspective is discussed. An algorithm is presented that is used to establish guaranteed service connections according to the applications bandwidth requirements. Simulation results are compared to state-of-the-art arbitration schemes and show significant improvements of latency and throughput, e.g. for an MPEG4 application. Synthesis results expose the low area overhead and impact on energy consumption which makes the concepts highly attractive for QoS-constraint many-core architectures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号