共查询到20条相似文献,搜索用时 46 毫秒
1.
《Computer Networks》2002,38(3):277-293
We recognize two trends in router design: increasing pressure to extend the set of services provided by the router and increasing diversity in the hardware components used to construct the router. The consequence of these two trends is that it is becoming increasingly difficult to map the services onto the underlying hardware. Our response to this situation is to define a virtual router architecture, called VERA, that hides the hardware details from the forwarding functions. This paper presents the details of VERA and reports our preliminary experiences implementing various aspects of the architecture. 相似文献
2.
The Clara prototype architecture collocates routing and computational functionality within a network, providing a scalable, high-performance computing switch router for computational services. Multiple off-the-shelf PCs provide Clara with computational power to, for example, perform real-time transcoding of video with minimal overhead 相似文献
3.
The growing use of clusters in diverse applications, many of which have real-time constraints, requires quality-of-service (QoS) support from the underlying cluster interconnect. All prior studies on QoS-aware cluster routers/networks have used simulation for performance evaluation. In this paper, we present an analytical model for a wormhole-switched router with QoS provisioning. In particular, the model captures message blocking due to wormhole switching in a pipelined router, and bandwidth sharing due to a rate-based scheduling mechanism, called VirtualClock. Then we extend the model to a hypercube-style cluster network. Average message latency for different traffic classes and deadline missing probability for real-time applications are computed using the model.
We evaluate a 16-port router and hypercubes of different dimensions with a mixed workload of real-time and best-effort (BE) traffic. Comparison with the simulation results shows that the single router and the network models are quite accurate in providing the performance estimates, and thus can be used as efficient design tools. 相似文献
4.
《Journal of Systems Architecture》2004,50(1):35-60
This paper provides an in-depth analysis using six basic router functional requirements, a primary switch fabrics (SFs) selection criterion, and a semi-quantitative compliance scoring scheme for 10 SFs. The goal is to select candidates that can serve a hardware (HW)-wise scalable and bi-directionally reconfigurable Internet Protocol (IP) router. HW scalability and bi-directional HW reconfigurability for an IP router denote respectively its ability to (1) expand according to network traffic capacity growth; and (2) be functionally converted to perform in two conceptual directions on-demand: “downward” as “edge”, or “upward” as “hub” or “backbone” router according to the layer of the internet services provider’s network hierarchy it is targeted to serve at the moment. Overall result points to Hypercube, Multistage Interconnection Network (MIN), and 3-Dimensional Torus Mesh as potential candidates. 相似文献
5.
A new generation architecture of IP routers called massive parallel forwarding and switching (MPFS) is proposed, which is totally different from modern routers. The basic idea of MPFS is mapping complicated forwarding process into multilevel scalable switch fabric so as to implement packet forwarding in a pipelining and distributed way. This processing mechanism is named forwarding in switching (FIS). By interconnecting multi-stage, lower speed components, called forwarding and switching nodes (FSN), MPFS achieves better scalability in forwarding and switching performance just like MPP. We put emphasis upon IPv6 lookup problem in MPFS and propose a method for partitioning IPv6 FIB and mapping them to switch fabric. Simulation and computation results suggest that MPFS routers can support line-speed forwarding with a million of IPv6 prefixes at 40 Gbps. We also propose an implementation of 160 Tbps core router based on MPFS architecture at last. 相似文献
6.
Asit K. MishraAuthor Vitae Aditya YanamandraAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(5):625-640
With increasing number of cores being integrated on a single die, Network-on-Chips (NoCs) have become the de-facto standard in providing scalable communication backbones for these multi-core chips. NoCs have a significant impact on the system’s performance, power and reliability. However, NoCs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly. Towards this end, this paper proposes a novel router architecture, where we tune the frequency of a router in response to network load to manage both performance and power. We propose three dynamic frequency tuning techniques, FreqBoost, FreqThrtl and FreqTune, targeted at congestion and power management in NoCs. We also propose and evaluate a novel fine-grained frequency tuning scheme where we vary the number of virtual-channels in a router dynamically. As a further optimization to these schemes, we propose a frequency tuning scheme where we tune the frequency of the four ports of a mesh router separately from the local port. As enablers for these techniques, we exploit Dynamic Voltage and Frequency Scaling (DVFS) and the imbalance in a generic router pipeline through time stealing. We also evaluate and analyze the proposed schemes from the point of view of reliability against soft error vulnerability and provide guidelines in choosing the appropriate scheme when reliability is the prime design constraint.Experiments using synthetic workloads on an 8 × 8 wormhole-switched mesh interconnect show that FreqBoost is a better choice for reducing average latency (maximum 40%) while, FreqThrtl provides the maximum benefits in terms of power saving and energy delay product (EDP). The FreqTune scheme is a better candidate for optimizing both performance and power, achieving on an average 36% reduction in latency, 13% savings in power (up to 24% at high load), and 40% savings (up to 70% at high load) in EDP. With application benchmarks, we observe IPC improvement up to 23% using our design. Our analysis shows FreqBoost to be the most robust scheme amongst the three schemes when reliability is a concern. 相似文献
7.
随着高性能网络规模的增加,高阶路由器结构设计成为高性能计算中研究的重点和热点。使用高阶路由器,网络能实现更低的报文传输延迟、网络构建成本和网络功耗,同时高阶路由器的应用还可以提高网络可靠性。过去十年是高阶路由器发展最快的时期,对近年高阶路由器的研究进行了综述,并对未来发展趋势进行了预测,主要介绍了以YARC为代表的经典结构化设计以及"network within a network"等近年来涌现的新型设计方法。未来的研究重点是解决高阶路由器结构设计中遇到的缓存和仲裁等各种问题,并利用光互连等技术设计性能更好的结构。 相似文献
8.
Caminero B. Carrion C. Quiles F.J. Duato J. Yalamanchili S. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(11):1009-1021
Quality of service (QoS) support in local and cluster area environments has become an issue of great interest in recent years. Most current high-performance interconnection solutions for these environments have been designed to enhance conventional best-effort traffic performance, but are not well-suited to the special requirements of the new multimedia applications. The multimedia router (MMR) aims at offering hardware-based QoS support within a compact interconnection component. One of the key elements in the MMR architecture is the algorithms used in traffic scheduling. These algorithms are responsible for the order in which information is forwarded through the internal switch. Thus, they are closely related to the QoS-provisioning mechanisms. In this paper, several traffic scheduling algorithms developed for the MMR architecture are described. Their general organization is motivated by chances for parallelization and pipelining, while providing the necessary support both to multimedia flows and to best-effort traffic. Performance evaluation results show that the QoS requirements of different connections are met, in spite of the presence of best-effort traffic, while achieving high link utilizations. 相似文献
9.
Sojoodi Amir Hossein Salimi Beni Majid Khunjush Farshad 《The Journal of supercomputing》2021,77(3):3165-3192
The Journal of Supercomputing - During recent years, big data explosion and the increase in main memory capacity, on the one hand, and the need for faster data processing, on the other hand, have... 相似文献
10.
This paper proposes a novel QoS-aware and congestion-aware Network-on-Chip architecture that not only enables quality-oriented network transmission and maintains a feasible implementation cost but also well balance traffic load inside the network to enhance overall throughput. By differentiating application traffic into different service classes, bandwidth allocation is managed accordingly to fulfill QoS requirements. Incorporating with congestion control scheme which consists of dynamic arbitration and adaptive routing path selection, high priority traffic is directed to less congested areas and is given preference to available resources. Simulation results show that average latency of high priority and overall traffic is improved dramatically for various traffic patterns. Cost evaluation results also show that the proposed router architecture requires negligible cost overhead but provides better performance for both advanced mesh NoC platforms. 相似文献
11.
Daniel S.W. Shin K.G. Sang Kyun Yun 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(1):62-75
Modern parallel and distributed applications have a wide range of communication characteristics and performance requirements. These diverse characteristics affect the performance and suitability of particular routing and switching policies in multihop point-to-point networks. In this paper, we identify a core set of architectural features necessary for flexible selection and implementation of multiple routing and switching schemes. Using this, we present a flexible router whose routing and switching policies can be tailored to the application, allowing the network to meet these diverse needs. By dedicating a small programmable processor to each incoming link, we can implement wormhole, virtual cut-through, and packet switching, as well as hybrid switching schemes, each under a variety of unicast and multicast routing algorithms. In addition, a flexible router can support several applications or traffic types simultaneously, enabling better support of applications with multiple traffic classes. We have designed, implemented, and fabricated the Programmable Routing Controller (PRG). Cycle-level simulations of mesh-connected PRCs also demonstrate that flexible routing and switching can significantly enhance application performance 相似文献
12.
面向以太网的物理帧时槽交换(EPFTS)是四川省网络与通信技术重点实验室提出的“单物理层用户数据传输平台网络”中的关键技术,它是以“面向以太网的帧”为数据传输单元的高速交换技术,正是针对实现EPFTS而提出的交换结构方案。在对常用的交换结构和调度算法进行分析的基础上,针对EPFTS要达到的目标和技术特点,提出了一种能够在物理层交换中保证服务质量的交换结构,称为基于总线的、每输入-输出独立的输出缓存交换结构,同时提出了逻辑队列的排队策略,并对该结构进行了软件仿真。仿真结果表明,使用加权公平调度算法,提出的交换结构对实时业务可提供端到端的QoS保证,对非实时业务可提供最大-最小公平服务。 相似文献
13.
Mingche Lai Author VitaeLei GaoAuthor Vitae Sheng MaAuthor VitaeXiao NongAuthor Vitae Zhiying WangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):98-109
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities. 相似文献
14.
We have developed a novel algorithm to adjust link-bandwidths of a given multicast tree, which sends a message with sizer from a source to a multiple destinations taking into consideration pipelined router. The algorithm that we have developed
tries to minimize the end-to-end delay time and resources such as bandwidths of a multicast tree, and performs admirably well
in any given multicast tree. Our evaluation shows that the proposed algorithm dramatically reduces the end-to-end delay time
and resources reserved to satisfy the time-constraints.
This work was supported in parts by BK21 and the Ministry of Information and Communication, Korea. Corresponding author: H.
Choo. 相似文献
15.
《Computers & Operations Research》2005,32(9):2255-2269
Most routers on the Internet employ a first-in-first-out (FIFO) scheduling rule to determine the order of serving data packets. This scheduling rule does not provide quality of service (QoS) with regards to the differentiation of services for data packets with different service priorities and the enhancement of routing performance. We develop a scheduling rule called Weighted Shortest Processing Time–Adjusted (WSPT-A), which is derived from WSPT (a scheduling rule for production planning in the manufacturing domain), to enhance router QoS. We implement a QoS router model based on WSPT-A and run simulations to measure and compare the routing performance of our model with that of router models based on the FIFO and WSPT scheduling rules. The simulation results show superior QoS performance when using the router model with WSPT-A. 相似文献
16.
《Computer Networks》2007,51(14):4189-4211
In the next generation Internet, the network will evolve from a plain communication medium into one that provides endless services to the users. These services will be composed of multiple cooperative distributed application elements. We name these services overlay applications. The cooperative application elements within an overlay application will build a dynamic communication mesh, namely an overlay association. The Quality of Service (QoS) perceived by the users of an overlay application greatly depends on the QoS experienced on the communication paths of the corresponding overlay association. In this paper, we present super-peer alternate path discovery (SPAD), a distributed middleware architecture that aims at providing enhanced QoS between end-points within an overlay association. To achieve this goal, SPAD provides a complete scheme to discover and utilize composite alternate end-to-end paths with better QoS than the path given by the default IP routing mechanisms. 相似文献
17.
Joan Puiggali Boleslaw K. Szymanski Teo Jov Jose L. Marzo 《Concurrency and Computation》2013,25(7):932-960
This article describes a technique for path unfolding for conditional branches in parallel programs executed on clusters. Unfolding paths following control structures makes it possible to break the control dependencies existing in the code and consequently to obtain a high degree of parallelism through the use of idle CPUs. The main challenge of this technique is to deal with sequences of control statements. When a control statement appears in a path after a branch, a new conditional block needs to be opened, creating a new code split before the previous one is resolved. Such subsequent code splits increase the cost of speculation management, resulting in reduced profits. Several decision techniques have been developed for improving code splitting and speculation efficiency in single machine architecture. The main contribution of this paper is to apply such techniques to a cluster of single processor systems and evaluate them in such an environment. Our results demonstrate that code splitting in conjunction with branch speculation and the use of statistical information improves the performance measured by the number of processes executed in a time unit. This improvement is particularly significant when the parallelized programs contain iterative structures in which conditions are repeatedly executed. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
18.
Communication demands have grown from separate data and voice to integrated multimedia, paving the way to converging fixed,
mobile and IP networks. Supporting Multimedia is a challenging task for wireless ad hoc network designers. Multimedia forms
high data rate traffic with stringent Quality of Service (QoS) requirements. Wireless ad hoc networks are characterized by
frequent topology changes, unreliable wireless channel, network congestion and resource contention. Providing scalable QoS
is the most important challenge for multimedia delivery over ad hoc networks. We introduce here a provisioning and routing
architecture for ad hoc networks which scales well while provisioning QoS. The proposed architecture is analysed using a mix
of HTTP, voice and video streaming applications over 54 Mbps 802.11 g-based ad hoc networks. The architecture is simulated
and compared to well-known routing protocols using the OPNET Modeller. The results show that our architecture scales well
with increase in the network size, and outperforms well-known routing protocols. 相似文献
19.
无线多媒体传感器网络(WMSNs)的网络服务质量(QoS)一直是人们关心的核心问题,然而目前WMSNs的QoS保障研究主要针对单个协议层或特定应用场景,缺少系统性的QoS体系框架研究。结合无线传感器网络自身特点,利用图论对网络进行建模。在此基础上,提出一种三层可计算QoS指标体系,并根据各种应用不同QoS需求将应用分为四类,设计出一种基于业务区分的无线多媒体传感器网络QoS体系结构(DQoSAW)。以传输MPEG视频流为例对DQoSAW进行验证,实验结果表明DQoSAW能够显著改进WMSNs的整体性能。 相似文献
20.
Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth “sharding” (i.e., partitioning) and stealing in order to mitigate the elevation in the zero-load latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design. 相似文献