首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
一种动态分配虚拟输出队列结构的片上路由器   总被引:1,自引:0,他引:1  
传统虚通道流控技术的片上路由器通过增加虚通道缓解排头阻塞引起的链路吞吐率下降以及网络拥塞的同时,面临缓冲区低利用率、仲裁开销较大等问题.而动态虚通道流控的片上路由器虽可通过动态管理缓冲单元,提高缓冲区利用率与链路吞吐率,但却不可避免流控与仲裁逻辑复杂度与开销的快速增长.为了提高链路吞吐率与缓冲区利用率,获得较好的性能与开销折中,提出一种动态分配虚拟输出队列结构的片上路由器DAVOQ,该结构通过快速链表动态组织虚拟输出队列,同时使用超前路由机制以简化仲裁逻辑,优化流水线.模拟与综合的结果表明,相比传统虚通道路由器,DAVOQ路由器改善报文传输延迟与吞吐率的同时,在0.13μm CMOS工艺下,节省了15.1%的标准单元面积与12.9%的漏电流功耗;而相比动态虚通道路由器,DAVOQ路由器能够以较小的吞吐率损失获得可观的延迟改善,同时节约15.6%的标准单元面积与20.5%的漏电流功耗.  相似文献   

2.
片上硅面积和功耗受到严重限制,报文缓冲区容量也受到严重限制,如何高效使用报文缓冲区是NoC设计的关键问题之一.动态划分虚通道缓冲区是高效使用报文缓冲区的有效方法之一,但会增加拥塞程度,甚至出现无限拥塞的情况.提出一种基于二步流控方法的片上动态虚通道(DAVC)路由器,该二步流控方法将报文分成报文头和报文体两部分分别运用流控算法.实验结果表明:与静态虚通道(SAVC)片上路由器相比,在缓存容量相等的情况下,DAVC路由器能提高23.2%的吞吐率,传输延迟降低27.2%;在DAVC缓存容量减半的情况下可获得相近的性能,节省28.3%的面积与23.8%的漏电流功耗.  相似文献   

3.
传统的路由器拥塞控制算法主要依据本级队列资源的拥塞状态信息进行报文丢弃决策,这将导致产生拥塞数据流浪费带宽问题BW-CDF.从理论上分析了BW-CDF问题产生的原因,为解决该问题提出了一种新的路由器拥塞控制算法CC-AMR,该算法综合考虑多级资源的拥塞状态而实施更加合理的报文丢弃决策.同时,阐述了该算法在基于网络处理器的核心路由器上的实现方法.实际的测试验证结果表明该算法能够缓解BW-CDF问题,从而较大幅度地提高了拥塞发生时路由器的总吞吐率.  相似文献   

4.
EasiCC:一种保证带宽公平性的传感器网络拥塞控制机制   总被引:1,自引:0,他引:1  
实用的传感器网络拥塞控制方案不仅需要满足多项网络性能指标,而且必须控制开销很小,提出了一种满足上述要求的拥塞控制机制EasiCC(EasiNet congestion control mechanism).在EasiCC中,数据流源节点将数据报文按比例划分到各优先等级中,各网络节点根据网络拥塞程度动态地、同步地调整报文过滤标准,结合报文过滤标准和报文优先级来调节网络流量,保证了无线信道带宽分配上的公平性;将网络准入控制和队列丢包手段相结合来调整网络流量,保证了网络综合性能指标.EasiCC控制开销很少,已在实际传感器网络平台中实现.模拟验证和实验测试结果显示,EasiCC能够公平地为各数据流分配发报速度和网络带宽,并且在报文传输成功率、传输延迟等性能指标上均有良好的表现.  相似文献   

5.
为了提高网络性能和安全、设计更有效地队列拥塞控制算法,通过研究现有的一些主动队列拥塞控制算法发现:大多数的拥塞控制算法的实现是基于队列长度或平均队列长度,这使得算法在提高网络整体性能上具有局限性。本文在现有的网络队列拥塞控制算法的基础上,将ACK信息确认报文传输状态引入到队列拥塞控制算法研究的系统中,通过仿真实验发现:ACK数据报文的传输状态在很大程度上影响着网络的吞吐量、数据包的传输延迟等。  相似文献   

6.
薛建生  谷羽  王光兴 《计算机工程》2006,32(16):105-106
提出了一种基于OSPF路由协议的拥塞控制策略。利用OSPF协议的链路状态更新报文(LSA)中的空闲位,增加路由器的拥塞状态和流量状态的描述,随LSA报文的传播将路由器的拥塞情况告知其他路由器,利用OSPF的快速收敛及时得知网络拥塞状况并进行早期的拥塞避免。仿真模拟表明,该方案能够控制网络拥塞,减小延迟,达到网络负载平衡。  相似文献   

7.
一个好的路由算法应同时满足:最小的路由跳数以减小传输延时,保持通讯的局域性;最大的平均情况和最坏情况吞吐率;简单的路由器结构。随机Oblivious路由算法在低功耗并行计算机互联网络以及片上网络中得到广泛应用。针对Torus网络下已提出的Oblivious路由算法所需虚通道数目多的缺点,提出了随机Oblivious路由算法WRD,该算法仅使用两条虚拟通道即可实现算法的无死锁性。通过仿真对所提算法的性能进行了验证,结果表明,该算法与使用两条虚拟通道的O1TURN路由算法相比,WRD路由算法在所有通讯模式下的网络吞吐率均有所提升。与使用四条虚拟通道的RLB算法相比,新提出的WRD路由算法性能接近于RLB算法,甚至在多个通讯模式下的网络吞吐率要好于RLB算法,而且WRD路由算法仅使用两条虚拟通道,降低了网络系统成本和功耗。  相似文献   

8.
随着集成电路工艺的等比例缩小,互连线延迟相对门延迟增加,导致报文在片上网络路由器之间的传输需要多个时钟周期。但是,在基于信用点流控策略中,物理链路中的寄存器在发生拥塞时不能够缓冲报文。因此,本文提出了一种自适应的通道双缓冲结构,能够在发生拥塞时缓冲报文。通过门级电路的设计和分析,根据逻辑努力方法建立了CDB的延迟模型。延迟模型的准确性利用Synopsys时序分析工具Prime Time在TSMC的65nm工艺库下被验证,两者相差不超过一个τ4。结果表明,在32nm工艺下,1mm长的半全局互连线通道双缓冲(CDB)和简单流水线(SPLS)所需要的级数相同。  相似文献   

9.
早期确定性拥塞指示算法   总被引:1,自引:0,他引:1  
分析了RED算法在拥塞指示信息传输上的不足,提出了一种早期确定性拥塞指示算法,使得拥塞指示能尽可能快地到达TCP源,以有效地响应路由器的早期拥塞,使用改进的NS进行了仿真实验,实验结果表明该算法在保证网络吞吐率的基础上,能更有效地降低路由器中的丢包率,提高网络的利率率。  相似文献   

10.
DTN(delay-tolerant network,延迟容忍网络)的网络特点及其采用的托管传输机制易造成网络受限资源(如缓存、带宽等)的耗尽,形成网络拥塞,导致网络性能的下降。传统TCP拥塞控制机制不适用于DTN网络。提出了一个全新的适用于DTN网络的拥塞避免与拥塞解除方案。拥塞避免根据在足够小的时间段内DTN链路的传输延迟和传输能力的确定性,建立DTN网络有向多径图,对数据发送速率、接收速率、带宽使用等链路负载分割与约束控制,尽可能地提高网络资源的利用率。拥塞解除在节点存储资源划分的基础上,通过节点内存储资源转换与节点间报文转移相结合的方法,解除DTN网络的拥塞状况。仿真结果显示,与其他DTN拥塞控制机制相比,所提方案具有良好的报文交付率、网络开销等网络性能。  相似文献   

11.
A delay model for router microarchitectures   总被引:1,自引:0,他引:1  
This article introduces a router delay model that takes into account the pipelined nature of contemporary routers and proposes pipelines matched to the specific flow control method employed. Given the type of flow control and router parameters, the model returns router latency in technology-independent units and the number of pipeline stages as a function of cycle time. We apply this model to derive realistic pipelines for wormhole and virtual-channel routers and compare their performance. Contrary to the conclusions of previous models, our results show that the latency of a virtual channel router doesn't increase as we scale the number of virtual channels up to 8 per physical channel. Our simulation results also show that a virtual-channel router gains throughput of up to 40 % over a wormhole router  相似文献   

12.
Asynchronous quasi-delay-insensitive (QDI) NoCs have several advantages over their clocked counterparts. Virtual channel (VC) is the most utilized flow control method in asynchronous routers but spatial division multiplexing (SDM) achieves better throughput performance for best-effort traffic than VC. A novel asynchronous SDM router architecture is presented. Area and latency models are provided to analyse the network performance of all router architectures including wormhole, virtual channel and SDM. Performance comparisons have been made with different configurations of payload size, communication distance, buffer size, port bandwidth, network size and number of VCs/virtual circuits. Compared with VC, SDM achieves higher throughput with lower area overhead.  相似文献   

13.
This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully, adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade-off between the network performance and hardware cost. The outcome of this research is a high-performance adaptive router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input bufferring, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.  相似文献   

14.
The evaluation of advanced routing features must be based on both of costs and benefits. To date, adaptive routers have generally been evaluated on the basis of the achieved network throughput (channel utilization), ignoring the effects of implementation complexity. In this paper, we describe a parameterized cost model for router performance, characterized by two numbers: router delay and flow control time. Grounding the cost model in a 0.8 micron gate array technology, we use it to compare a number of proposed routing algorithms. From these design studies, several insights into the implementation complexity of adaptive routers are clear. First, header update and selection is expensive in adaptive routers, suggesting that absolute addressing should be reconsidered. Second, virtual channels are expensive in terms of latency and cycle time, so decisions to include them to support adaptivity or even virtual lanes should not be taken lightly. Third, requirements of larger crossbars and more complex arbitration cause some increase in the complexity of adaptive routers, but the rate of increase is small. Last, the complexity of adaptive routers significantly increases their setup delay and flow control cycle times, implying that claims of performance advantages in channel utilization and low load latency must be carefully balanced against losses in achievable implementation speed  相似文献   

15.
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities.  相似文献   

16.
We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity.  相似文献   

17.
This research paper proposes a bio-inspired self-aware fault-tolerant routing protocol for network-on-chip architecture using particle swarm optimization (PSO), which considers synchronous, asynchronous, and self-organizing communication mechanisms to intelligently load-balance the traffic on the entire network in the presence of faulty components. By way of experimentation and simulation, this study demonstrates that the proposed scheme can converge to a global optimum, minimal routing path in real time, in the presence of network congestion and faulty routers and links. The basic PSO algorithm was improved to implement the proposed routing scheme, named bio-inspired self-aware fault-tolerant routing protocol (BISFTRP). This scheme uses the synchronous, asynchronous, and self-organizing features of PSO to create a global routing table and intelligent adaptation, which gives rise to scalable, real-time, and dynamic routing decisions with high throughput, low latency, and minimum power consumption. A cycle-accurate simulation system to demonstrate the flexibility and efficiency of the proposed scheme is used. Comparison results with state-of-the-art fault-tolerant routing algorithms show that the BISFTRP routing protocol achieves high routing performance without routing oscillations and throughput degradation. Furthermore, the hardware implementation results show that the BISFTRP router achieves an efficient area and power utilization, compared with state-of-the-art routers.  相似文献   

18.
Adaptive routing and virtual channels are used to increase routing adaptivity in wormhole-routed two-dimensional meshes. But increasing channel buffer utilization without considering even distribution of the traffic loads tends to cause congestion in the most adaptive routing area. To avoid such traffic congestion, a concept of the restricted area is proposed. The proposed restricted area, defined to be a part of the network where message transmission concentrates, can be located following the region of adaptivity. By properly guiding message routing inside and outside the area, we are able to achieve more balanced buffer utilization and to reduce traffic congestion accordingly. The performance of several routing algorithms with or without using the restricted area is simulated and evaluated under various traffic loads and distribution patterns. The results indicate that routing algorithms with the restricted areas yield constantly larger throughput and smaller latency than routing algorithms without using the concept.  相似文献   

19.
针对片上网络典型路由器的缓冲资源利用率不高、大容量缓存设计受限等问题,在不增加缓存和虚通道的情况下,提出一种新的面向片上网络缓冲资源争用的路由器设计方案。在该路由器中,当某个输入端繁忙发生资源争用情况时,将阻塞数据包分配到其他拥有空闲缓存资源的输入端口,解决缓冲资源的争用问题,从而提高网络整体性能。SystemC仿真结果表明,相对于基本路由器,该路由器在热点模式和均衡模式下均具有较高的网络饱和率和吞吐量,尤其在热点模式下提高了约11.4%的饱和率。FPGA实现结果表明,该路由器的面积开销较小,能较好满足片上网络的应用需求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号