期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

彭元喜朱红雷陈海燕《计算机研究与发展》2011,48(1)

片上硅面积和功耗受到严重限制,报文缓冲区容量也受到严重限制,如何高效使用报文缓冲区是NoC设计的关键问题之一.动态划分虚通道缓冲区是高效使用报文缓冲区的有效方法之一,但会增加拥塞程度,甚至出现无限拥塞的情况.提出一种基于二步流控方法的片上动态虚通道(DAVC)路由器,该二步流控方法将报文分成报文头和报文体两部分分别运用流控算法.实验结果表明:与静态虚通道(SAVC)片上路由器相比,在缓存容量相等的情况下,DAVC路由器能提高23.2%的吞吐率,传输延迟降低27.2%;在DAVC缓存容量减半的情况下可获得相近的性能,节省28.3%的面积与23.8%的漏电流功耗. 相似文献

2.

具有拥塞缓解策略的动态虚拟通道研究及其VLSI实现 总被引：1，自引：0，他引：1

赖明澈王志英郭建军戴葵《计算机学报》2008,31(11)

虚拟通道技术改善了片上网络性能,却带来了巨大的面积与功耗开销.通过分析静态虚拟通道的不足,提出了基于拥塞缓解策略的动态虚拟通道结构.它采用链表方式组织缓冲,可以自动调整通道结构来适应各种流量负载:在较低流量下,该结构扩展通道队列深度,减小了报文传输延迟;在较高流量下,它增加虚拟通道数量,消除队列头阻塞与通道不足阻塞,并缓解拥塞现象发生,减少流反馈次数,提高了网络吞吐率.在90nm CMOS工艺下完成了DVC路由器的VLSI设计,与传统路由器相比,不仅改善了报文传输延迟与吞吐率,而且有效降低了面积与功耗开销. 相似文献

3.

Torus网络中移动气泡流控及其自适应路由实现

王永庆谢伦国付清朝《计算机研究与发展》2014,51(8):1854-1862

在torus网络中气泡流控是一种有效、实用的死锁避免技术.关键气泡机制使用虚跨步技术,只需要使用一个报文缓冲区就可以避免torus网络中的环内死锁,但是可能存在阻塞.首先提出了伪报文协议,然后结合伪报文协议设计了移动气泡流控策略,克服了关键气泡不能移动时引起的阻塞.伪报文协议基于简单的请求-应答,移动气泡流控则使用传统的信用传输方法.采用该机制,路由器只需要最少两条虚通道,每条虚通道最少一个报文空间就可以实现无死锁完全自适应路由.通过对经典路由器进行适当修改,给出了实现移动气泡流控的方法.采用模拟器比较了各种气泡流控的性能,结果表明,移动气泡流控性能超出传统的气泡机制,而加入自适应机制后的性能明显高于其他非自适应方法,不仅降低了延迟,吞吐率也提高20%以上,最大幅度甚至达100%. 相似文献

4.

一种可配置双向链路的片上网络容错偏转路由器

冯超超张民选李晋文戴艺《计算机研究与发展》2014,51(2):454-463

随着CMOS工艺进入纳米时代,工艺尺寸的不断缩小增加了集成电路对瞬态故障与永久故障的敏感性.在片上网络中提供容错支持对于提高单芯片多处理器片上数据传输的可靠性至关重要.为了处理片上网络中的瞬态故障与永久故障链路,提出一种可配置双向链路的容错偏转路由器BiFTDR.相邻BiFTDR路由器之间采用一对可配置方向的双向链路互连,根据链路的故障状态和路由器的到达包信息对双向链路的方向进行动态配置,在单向链路故障的情况下不需要绕道路由即可实现容错,并且不需要路由表从而降低了路由器的硬件实现开销.模拟结果表明,在合成通信模式下,网络中包含5条和15条永久故障链路的情况下,BiFTDR路由器的包平均延迟比一种基于强化学习的容错偏转路由器分别少10%和19%;在真实应用运行踪迹通信模式下,与无故障网络的包平均延迟相比,BiFTDR路由器的性能损失不到1%.对于瞬态故障,即使在高故障率下BiFTDR路由器的性能下降程度也较小.在65 nm工艺下对BiFTDR路由器进行综合,能达到500 MHz的时钟频率,并且具有较小的面积和功耗开销. 相似文献

5.

低功耗片上网络路由器设计

周端彭景张剑贤张晗《计算机应用》2011,31(10):2621-2624

针对片上网络路由器功耗问题,在系统级层次上对影响路由器功耗的虚通道数目、缓存深度和数据微片位数等关键因素进行了研究。提出了综合多种功耗关键因素以及虚拟通道共享交叉开关输入端口的功耗降低方法,设计实现了一种低能耗的NoC路由器。实验结果表明,与Alpha 21364路由器和IBM InfiniBand路由器相比,所设计的路由器具有较低的功耗。相似文献

6.

一种基于微片分组的片上网络微节点交叉开关调度机制

刘亮亮韩国栋宋克张帆《小型微型计算机系统》2013,34(5)

在基于微片(flit)分组的动态缓存分配基础上,提出一种基于微片分组的片上网络交叉开关调度机制.该机制与静态独立分割缓存的思想不同,首先对输入端缓存进行统一管理,对微片根据其流向进行分组,并为所分各“组”动态分配缓存,然后引入一种基于“组”规模的概率仲裁算法,通过“组”分配和开关分配实现调度过程.为进一步降低开销,还在该机制基础上提出一种各“组”共享仲裁的策略.理论分析与实验结果均表明:所提出的机制相对于传统和动态虚通道机制,可节约25％以上的硬件开销并可获得更优的网络延迟与吞吐性能;共享仲裁策略可在所提机制基础上进一步降低硬件开销,但其代价是网络性能有所下降. 相似文献

7.

一种基于Torus网络的高效随机Oblivious路由算法

《软件工程师》2017,(1)

一个好的路由算法应同时满足:最小的路由跳数以减小传输延时,保持通讯的局域性;最大的平均情况和最坏情况吞吐率;简单的路由器结构。随机Oblivious路由算法在低功耗并行计算机互联网络以及片上网络中得到广泛应用。针对Torus网络下已提出的Oblivious路由算法所需虚通道数目多的缺点,提出了随机Oblivious路由算法WRD,该算法仅使用两条虚拟通道即可实现算法的无死锁性。通过仿真对所提算法的性能进行了验证,结果表明,该算法与使用两条虚拟通道的O1TURN路由算法相比,WRD路由算法在所有通讯模式下的网络吞吐率均有所提升。与使用四条虚拟通道的RLB算法相比,新提出的WRD路由算法性能接近于RLB算法,甚至在多个通讯模式下的网络吞吐率要好于RLB算法,而且WRD路由算法仅使用两条虚拟通道,降低了网络系统成本和功耗。相似文献

8.

CbRouter:一种利用交叉开关旁路的双向链路片上网络路由器

下载免费PDF全文

方磊董德尊吴际夏军王克非《计算机工程与科学》2015,37(2):199-206

片上互连网络为多核体系结构提供了高效的通信支持。目前的片上网络通常采用单向传输链路,链路资源利用率较低。为了实现链路带宽资源高效分配、进而高效利用链路带宽资源,提出了一种新的双向链路调度算法,并设计了一种支持此算法的双向链路路由器。这种新型的路由器结构能够在不影响路由原有数据通道条件下,提供一条旁路数据通道来快速传输数据。实验结果表明,应用该双向链路路由器可使Mesh网络饱和吞吐率和链路平均利用率分别得到最大83.3%和24.53%的提升。相似文献

9.

链路层上可靠的虚跨步交换支持技术

王永庆张民选《计算机工程与科学》2012,34(8):119-124

在高性能互连网络设计中,缩短通信延迟一直是设计的首要目标之一。虚跨步交换技术是一种降低延迟的有效手段,但是在有限的输入缓冲区条件下,在链路层上实现高效可靠传输具有一定的挑战性。本文提出了一种可靠的低延迟链路层设计方法,可实现对虚跨步的有效支持,减少了报文在中间路由器上的延迟。该方法结合了报文格式设计、发送方管理和接收方管理。通过在报文头中加入额外的校验码,有效地保护了报文头中的信息,提高了链路的容错能力;通过链路级重传,减少了端到端重传引起的时间、协议开销;通过对接收处理逻辑,尤其是接收缓冲区管理的有效实现,避免了可能出现的缓冲区溢出以及流控失效问题。相似文献

10.

一种64位Booth乘法器的设计与优化

下载免费PDF全文

何军朱英《计算机工程》2012,38(16):253-254

针对国产多核处理器的64位整数乘法器面积和功耗开销大的问题,提出一种新的Booth编码方式,对其Booth编码方式进行优化,通过多种方法验证设计优化的正确性,采用标准单元库进行逻辑综合评估。结果表明,工作频率可达1.0 GHz以上,面积减少9.64%,动态功耗和漏电功耗分别减少6.34%和11.98%,能有效减少乘法器的面积和功耗,达到预期目标。相似文献

11.

HIPIQS: a high-performance switch architecture using input queuing

Sivaram R. Stunkel C.B. Panda D.K. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):275-289

Switch-based interconnects are used in a number of application domains, including parallel system interconnects, local area networks, and wide area networks. However, very few switches have been designed that are suitable for more than one of these application domains. Such a switch must offer both extremely low latency and very high throughput for a variety of different message sizes. While some architectures with output queuing have been shown to perform extremely well in terms of throughput, their performance can suffer when used in systems where a significant portion of the packets are extremely small. On the other hand, architectures with input queuing offer limited throughput or require fairly complex and centralized arbitration that increases latency. In this paper, we present a new input queue-based switch architecture called HIPIQS (HIgh-Performance Input-Queued Switch). It offers low latency for a range of message sizes and provides throughput comparable to that of output queuing approaches. Furthermore, it allows simple and distributed arbitration. HIPIQS uses a dynamically allocated multiqueue organization, pipelined access to multibank input buffers, and small cross-point buffers to deliver high performance. Our simulation results show that HIPIQS can deliver performance close to that of output queuing approaches over a range of message sizes, system sizes, and traffic. The switch architecture can therefore be used to build high performance switches that are useful for both parallel system interconnects and for building computer networks 相似文献

12.

A practical low-latency router architecture with wing channel for on-chip network

Mingche Lai Author VitaeLei GaoAuthor Vitae Sheng MaAuthor VitaeXiao NongAuthor Vitae Zhiying WangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):98-109

With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities. 相似文献

13.

基于拥塞预测的NoC自适应仲裁方法*

杨盛光李丽徐懿张宇昂娄孝祥高明伦《计算机应用研究》2009,26(2):652-654

传统用于总线系统或互联网的仲裁方法已不能很好地适应NoC应用环境。围绕NoC系统性能的关键影响因素——拥塞状态,提出了一种基于全局和本地拥塞预测的仲裁策略(GLCA),以改善NoC网络延迟。实验结果表明,相对于RR方法,新仲裁算法使得网络平均包延迟和平均吞吐量最大分别可改善20.5%和8%,并且在不同负载条件下都保持了其优势。综合结果显示, GLCA与RR方法相比,路由器仅在组合逻辑上有少许增加(25.7%)。相似文献

14.

Trends in highly scalable crossbar-based packet switch architecture

Kenji Yoshigoe 《Computer Communications》2009,32(4):740-749

A combined input and crosspoint queued (CICQ) switch is receiving significant attention to be the next generation high speed packet switch for its scalability; however, a multi-cabinet implementation of a combined input and crosspoint queued (CICQ) switch unavoidably introduces a large round-trip time (RTT) latency between the line cards and switch fabric, resulting a large crosspoint (CP) buffer requirement. In this paper, virtual crosspoint queues (VCQs) that significantly reduces the CP buffer requirement of the CICQ switch is investigated. The VCQs unit resides inside the switch fabric, is dynamically shared among virtual output queues (VOQ) from the same source port, and is operated at the line rate, making the implementation practical. A threshold-based exhaustive round-robin (T-ERR) arbitration is employed to reduce buffer hogging at VCQ. The T-ERR at VCQ and CP arbiters serves packets residing in a longer queue more frequently than packet residing in a shorter queue. Consequently, the T-ERR, drastically increases the throughput of the CICQ switch with small CP buffers. A multi-cabinet implementation of CICQ switch do not support multicasting traffic well since a combination of small CP buffer in the switch fabric and a large RTT latency between the line cards and switch fabric results in non-work conservation of the intra-switch link. Deployment of multicast FIFO buffer between the input buffer and CP buffer shows a promise. With its ability to achieve high throughput independent of RTT and switch port size, the integration of the VCQ architecture and T-ERR scheduler to the CICQ switch is ideal for supporting ever-increasing Internet traffic that requires higher data rate, larger switch size, and efficient multicasting. 相似文献

15.

片上网络中低延时可扩展的路由器结构设计 总被引：1，自引：0，他引：1

张媛媛孙光苏厉金德鹏曾烈光《传感器与微系统》2012,31(8):134-136

为了满足片上网络中路由器能同时支持多个IP核的要求,并同时具有较好的延时性能,设计了一种分布式路由和仲裁的路由器结构。其中的仲裁模块根据当前路由器各输入端口的请求状态和下一路由器相应输入端口缓冲器的状态进行仲裁,此仲裁方法提高了数据包传输的成功率,从而降低了传输延时,使路由器具有良好的延时性能,同时仿真结果表明:该路由器在面积开销方面具有良好的可扩展性。相似文献

16.

基于数据流向的片上网络动态缓存分配机制*

谢同飞韩国栋肖庆辉《计算机应用研究》2011,28(11):4251-4255

为了有效利用缓存资源,提出一种动态分配片上网络路由器端口缓存的方法,根据传输方向将输入端口接收到的数据分成不同的组,每个组对应一个输出端口,并将数据以组的形式进行存储,控制部件根据各个组数据规模为其动态分配缓存资源。与基于虚通道的动态缓存分配方式相比,该方法降低了控制和仲裁的复杂度。仿真结果表明,获得同等性能的条件下,该方法可以有效降低缓存的需求。相似文献

17.

Design and performance of speculative flow control for high-radix datacenter interconnect switches

Cyriel Minkenberg Mitchell Gusat 《Journal of Parallel and Distributed Computing》2009

High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times. 相似文献

18.

Design of a performance enhanced and power reduced dual-crossbar Network-on-Chip (NoC) architecture

Yixuan ZhangAuthor VitaeRandy Morris Jr.Author Vitae Avinash K. Kodi^{Author Vitae} 《Microprocessors and Microsystems》2011,35(2):110-118

The input buffers of the current packet-switched Network-on-Chip (NoC) architectures consume a significant portion of the total power of the interconnection network. Reducing the size of input buffers would result in degraded performance, while eliminating all buffers would result in increased power at high network load. In this article, we propose DXbar: an innovative dual-crossbar design. By combining the advantages of buffered and bufferless networks, we achieve at least 20% performance improvement in terms of throughput and latency, and at least 20% power saving over buffered networks with virtual channels. Furthermore, DXbar can outperform current bufferless networks with deflecting and dropping protocols while consuming at most half of the power. 相似文献