首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We have previously proposed an efficient switch architecture called multiple input/output-queued (MIOQ) switch and showed that the MIOQ switch can match the performance of an output-queued switch statistically. In this paper, we prove theoretically that the MIOQ switch can match the output queueing exactly , not statistically, with no speedup of any component. More specifically, we show that the MIOQ switch with two parallel switches (which we call a parallel MIOQ (PMIOQ) switch in this paper) can provide exact emulation of an output-queued switch with a broad class of service scheduling algorithms including FIFO, weighted fair queueing (WFQ) and strict priority queueing regardless of incoming traffic pattern and switch size. To do that, we first propose the stable strategic alliance (SSA) algorithm that can produce a stable many-to-many assignment, and prove its finite, stable and deterministic properties. Next, we apply the SSA algorithm to the scheduling of a PMIOQ switch with two parallel switches, and show that the stability condition of the SSA algorithm guarantees for the PMIOQ switch to emulate an output-queued switch exactly. To avoid possible conflicts in a parallel switch, each input-output pair matched by the SSA algorithm must be mapped to one of two crossbar switches. For this mapping, we also propose a simple algorithm that requires at most 2N steps for all matched input-output pairs. In addition, to relieve the implementation burden of N input buffers being accessed simultaneously, we propose a buffering scheme called redundant buffering which requires two memory devices instead of N physically-separate memories. In conclusion, we demonstrate that the MIOQ switch requires two crossbar switches in parallel and two physical memories at each input and output to emulate an output-queued switch with no speedup of any component.  相似文献   

2.
Matching output queueing with a combined input/output-queued switch   总被引:19,自引:0,他引:19  
The Internet is facing two problems simultaneously: there is a need for a faster switching/routing infrastructure and a need to introduce guaranteed qualities-of-service (QoS). Each problem can be solved independently: switches and routers can be made faster by using input-queued crossbars instead of shared memory systems; QoS can be provided using weighted-fair queueing (WFQ)-based packet scheduling. Until now, however, the two solutions have been mutually exclusive-all of the work on WFQ-based scheduling algorithms has required that switches/routers use output-queueing or centralized shared memory. This paper demonstrates that a combined input/output-queueing (CIOQ) switch running twice as fast as an input-queued switch can provide precise emulation of a broad class of packet-scheduling algorithms, including WFQ and strict priorities. More precisely, we show that for an N×N switch, a “speedup” of 2-1/N is necessary, and a speedup of two is sufficient for this exact emulation. Perhaps most interestingly, this result holds for all traffic arrival patterns. On its own, the result is primarily a theoretical observation; it shows that it is possible to emulate purely OQ switches with CIOQ switches running at approximately twice the line rate. To make the result more practical, we introduce several scheduling algorithms that with a speedup of two can emulate an OQ switch. We focus our attention on the simplest of these algorithms, critical cells first (CCF), and consider its running time and implementation complexity. We conclude that additional techniques are required to make the scheduling algorithms implementable at a high speed and propose two specific strategies  相似文献   

3.
一种可提供QoS保障的新型交换结构   总被引:3,自引:1,他引:2       下载免费PDF全文
伊鹏  汪斌强  郭云飞  李挥 《电子学报》2007,35(7):1257-1263
本文采用带缓存交叉开关作为核心交换单元,构建了一种空分复用扩展的联合输入/交叉节点/输出排队(SDM-CICOQ)交换结构,从理论上证明了当扩展因子为2时,SDM-CICOQ交换结构可以获得100%的吞吐量,并且能够完全模拟输出排队(OQ)交换结构,从而能够提供服务质量(QoS)保障.本文还给出了一种层次化优先级调度(HPS)方案作为SDM-CICOQ交换结构调度机制的工程设计参考,仿真结果表明采用HPS调度方案SDM-CICOQ交换结构可获得良好的性能.  相似文献   

4.
A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are load-balanced packet-by-packet over multiple lower speed center stage packet switches. It is known that, for unicast traffic, a PPS can precisely emulate a FCFS output-queued (OQ) switch with a speedup of two and an OQ switch with delay guarantees with a speedup of three. In this paper we ask: is it possible for a PPS to emulate the behavior of an OQ multicast switch? The main result is that for multicast traffic an N-port PPS can precisely emulate a FIFO OQ switch with a speedup of S>2√N+1, and a switch that provides delay guarantees with a speedup of S>2√(2N)+2  相似文献   

5.
徐宁  余少华  汪学舜 《电子学报》2012,40(12):2360-2366
针对混合输入-交叉点队列(CICQ)交换结构受限于"流控通信延时"、"需要2倍内部加速仿真输出队列(OQ)交换"以及单纯交叉点缓冲(CQ)存在"非均衡流量模式下吞吐量性能不足"等问题,本文提出一种新型的"负载均衡交叉点缓冲交换结构".采用固定模式时隙轮转匹配进行负载均衡处理,将到达输入端口的非均衡流量转化为近似均衡流量并且平均分配到同一输出端口对应的交叉缓冲中,从而可以利用较小的交叉点缓冲来模拟输出队列调度,简化调度过程并且提高吞吐量.理论分析证明了这种新结构的稳定性以及模拟输出队列交换的能力.同时仿真表明,采用该交换结构可以在不需要内部加速的条件下获得相当于输出队列交换的性能,并且有效地解决了交叉点缓冲队列非均衡流量性能不足的问题.  相似文献   

6.
The paper studies input-queued packet switches loaded with both unicast and multicast traffic. The packet switch architecture is assumed to comprise a switching fabric with multicast (and broadcast) capabilities, operating in a synchronous slotted fashion. Fixed-size data units, called cells, are transferred from each switch input to any set of outputs in one time slot, according to the decisions of the switch scheduler, that identifies at each time slot a set of nonconflicting cells, i.e., cells neither coming from the same input, nor directed to the same output. First, multicast traffic admissibility conditions are discussed, and a simple counterexample is presented, showing intrinsic performance losses of input-queued with respect to output-queued switch architectures. Second, the optimal scheduling discipline to transfer multicast packets from inputs to outputs is defined. This discipline is rather complex, requires a queuing architecture that probably is not implementable, and does not guarantee in-sequence delivery of data. However, from the definition of the optimal multicast scheduling discipline, the formal characterization of the sustainable multicast traffic region naturally follows. Then, several theorems showing intrinsic performance losses of input-queued with respect to output-queued switch architectures are proved. In particular, we prove that, when using per multicast flow FIFO queueing architectures, the internal speedup that guarantees 100% throughput under admissible traffic grows with the number of switch ports.  相似文献   

7.
We study a practical approach to match the performance of an output-queued switch statistically. For this purpose, we propose a novel switching architecture called a multiple input/output-queued (MIOQ) switch that requires no speedup for providing sufficient switching bandwidth. To operate an MIOQ switch in a practical manner, we also propose a multitoken-based arbiter which schedules the switch at a high operation rate and a virtual first-in first-out queueing scheme which guarantees the departure order of cells belonging to the same traffic flow at output. Additionally, we show that the proposed switch can naturally provide asymmetric bandwidth for inputs and outputs, which may be important in dealing with the links with different bandwidth demands. Finally, we compare the performance of an MIOQ switch with that of an output-queued switch and discuss the design criteria to match the performance of an output-queued switch.  相似文献   

8.
具有纵横输入互连方式和缓冲结构的递归Knockout交换网络   总被引:1,自引:0,他引:1  
本文提出了具有纵横输入(CrosbarInput)互连方式和输入缓冲(InputBufered)结构的递归Knockout交换网络(CIBRKS).通过采用纵横输入互连方式可减少内部小交换单元的数目,并可使信元传送顺序不会受群输出端口数目的影响.而通过在每个输入端放置缓冲器可在保持丢失率性能不变的情况下,可使整个交换网络的级数减少,从而也就减少了信元在群网络中的传输时延.另外,在该结构中,通过把信元滤址的功能从每个小交换单元中提取出来放在每个输入端口,又进一步减少了小交换单元的功能.通过比较,我们认为,作为大规模ATM交换网络结构,CIBRKS结构比传统的RKS结构具有较好的性能/复杂度特性.  相似文献   

9.
Internet 中的交换机面临着高速交换和提供QoS保证的双重挑战,前者要求交换机的缓存以线速工作,后者要求交换机能完全模仿输出队列交换机。目前交叉点缓存交换机仿真输出队列交换机的方案需要交换机内部加速2倍,对硬件实现要求较高。该文利用双端口技术,提出了一种新型的交叉点缓存交换机结构,理论分析说明,该变长分组交换机在无需内部加速的情况下能够仿真输出队列交换机,并且交叉点缓存的需求是有下界的,从而表明该交换结构适合高速交换。  相似文献   

10.
We propose an innovative agile crossbar switch architecture called contention‐tolerant crossbar, denoted by CTC(N). Unlike the conventional crossbar and the crossbar with crosspoint buffers, which require complex hardware resolvers to grant one out of multiple output requests, CTC(N) can tolerate output contentions by a pipelining mechanism, with pipeline stages implemented as buffers in input ports. These buffers are used to decouple the scheduling task into N independent parts in such a way that N schedulers are located in N input ports, and they operate independently and in parallel. Without using arbiters and/or crosspoint buffers that require additional chip area, the CTC(N) switch is more scalable than existing crossbars. We analyze the throughput of CTC(N) switch, and find 63% throughput bottleneck. For achieving 100%, we consider two approaches: using internal speedup and using space multiplexing without internal speedup. We prove that 100% throughput can be achieved with internal speedup 2 or using two layers of CTC(N) fabric mathematically. Our simulation results validate our theoretical analysis. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
A new ATM switch architecture is presented. Our proposed Multinet switch is a self-routing multistage switch with partially shared internal buffers capable of achieving 100% throughput under uniform traffic. Although it provides incoming ATM cells with multiple paths, the cell sequence is maintained throughout the switch fabric thus eliminating the out-of-order cell sequence problem. Cells contending for the same output addresses are buffered internally according to a partially shared queueing discipline. In a partially shared queueing scheme, buffers are partially shared to accommodate bursty traffic and to limit the performance degradation that may occur in a completely shared system where a small number of calls may hog the entire buffer space unfairly. Although the hardware complexity in terms of number of crosspoints is similar to that of input queueing switches, the Multinet switch has throughput and delay performance similar to output queueing switches  相似文献   

12.
在CICQ交换结构下实现分布式的WFQ类加权公平调度算法   总被引:1,自引:0,他引:1  
传统的基于crossbar的输入排队交换结构在提供良好的QoS方面存在很大的不足,而CICQ(Combined Input and Crosspoint buffered Queuing)交换结构与传统的交换结构相比,不但能在各种输入流下提供接近输出排队的吞吐率,而且能提供良好的QoS支持。该文基于CICQ结构,提出了在输入排队条件下实现基于流的分布式WFQ类分组公平调度算法的方案,并通过仿真验证了这一方案的有效性。  相似文献   

13.
Input queued (IQ) switch architectures with virtual output queues (VOQ) scale up to very high speeds and have been a subject of intense research in the past decade. VOQ IQ switches require switch matrix scheduling algorithms to match input ports to out ports. In this tutorial article, we present an overview of switch matrix scheduling for VOQ IQ switches with crossbar switch fabrics. We then describe what we believe will be the next generation of high-speed crossbar switches: the evolution of IQ switches to combined input and crossbar queued (CICQ) switches. With the continued increase in density of VLSI, sufficient buffering at crossbar cross points for one cell or packet has become feasible to implement. We show how CICQ switches have simple schedulers and result in lower delay than IQ switches. Both IQ and CICQ switches have unstable regions. We show how a threshold and bursting technique can feasibly achieve stability. We also show how CICQ switches are better suited (than IQ switches) for switching of variable-length packets such as IP packets. Many challenges remain in IQ and CICQ switches. In particular, the inclusion of QoS scheduling methods that are currently only suitable for output queued switches is a major open problem.  相似文献   

14.
The iSLIP scheduling algorithm for input-queued switches   总被引:1,自引:0,他引:1  
An increasing number of high performance internetworking protocol routers, LAN and asynchronous transfer mode (ATM) switches use a switched backplane based on a crossbar switch. Most often, these systems use input queues to hold packets waiting to traverse the switching fabric. It is well known that if simple first in first out (FIFO) input queues are used to hold packets then, even under benign conditions, head-of-line (HOL) blocking limits the achievable bandwidth to approximately 58.6% of the maximum. HOL blocking can be overcome by the use of virtual output queueing, which is described in this paper. A scheduling algorithm is used to configure the crossbar switch, deciding the order in which packets will be served. Previous results have shown that with a suitable scheduling algorithm, 100% throughput can be achieved. In this paper, we present a scheduling algorithm called iSLIP. An iterative, round-robin algorithm, iSLIP can achieve 100% throughput for uniform traffic, yet is simple to implement in hardware. Iterative and noniterative versions of the algorithms are presented, along with modified versions for prioritized traffic. Simulation results are presented to indicate the performance of iSLIP under benign and bursty traffic conditions. Prototype and commercial implementations of iSLIP exist in systems with aggregate bandwidths ranging from 50 to 500 Gb/s. When the traffic is nonuniform, iSLIP quickly adapts to a fair scheduling policy that is guaranteed never to starve an input queue. Finally, we describe the implementation complexity of iSLIP. Based on a two-dimensional (2-D) array of priority encoders, single-chip schedulers have been built supporting up to 32 ports, and making approximately 100 million scheduling decisions per second  相似文献   

15.
Benes switching fabrics with O(N)-complexity internal backpressure   总被引:5,自引:0,他引:5  
Multistage buffered switching fabrics are the most efficient method for scaling packet switches to very large numbers of ports. The Benes network is the lowest-cost switching fabric known to yield operation free of internal blocking. Backpressure inside a switching fabric can limit the use of expensive off-chip buffer memory to just virtual-output queues in front of the input stage. This article extends the known credit-based flow control (backpressure) architectures to the Benes network. To achieve this, we had to successfully combine per-flow backpressure, multipath routing (inverse multiplexing), and cell resequencing. We present a flow merging scheme that is needed to bring the cost of backpressure down to O(N) per switching element, and for which we have proved freedom from deadlock for a wide class of multipath cell distribution algorithms. Using a cell-time-accurate simulator, we verify operation free of internal blocking, evaluate various cell distribution and resequencing methods, compare performance to that of ideal output queuing, the iSLIP crossbar scheduling algorithm, and adaptive and randomized routing, and show that the delay of well-behaved flows remains unaffected by the presence of congested traffic to oversubscribed output ports.  相似文献   

16.
On the speedup required for work-conserving crossbar switches   总被引:5,自引:0,他引:5  
This paper describes the architecture for a work-conserving server using a combined I/O-buffered crossbar switch. The switch employs a novel algorithm based on output occupancy, the lowest occupancy output first algorithm (LOOFA), and a speedup of only two. A work-conserving switch provides the same throughput performance as an output-buffered switch. The work-conserving property of the switch is independent of the switch size and input traffic pattern. We also present a suite of algorithms that can be used in combination with LOOFA. These algorithms determine the fairness and delay properties of the switch. We also describe a mechanism to provide delay bounds for real-time traffic using LOOFA. These delay bounds are achievable without requiring output-buffered switch emulation  相似文献   

17.
On Guaranteed Smooth Switching for Buffered Crossbar Switches   总被引:2,自引:0,他引:2  
Scalability considerations drive the evolution of switch design from output queueing to input queueing and further to combined input and crosspoint queueing (CICQ). However, CICQ switches with credit-based flow control face new challenges of scalability and predictability. In this paper, we propose a novel approach of rate-based smoothed switching, and design a CICQ switch called the smoothed buffered crossbar or sBUX. First, the concept of smoothness is developed from two complementary perspectives of covering and spacing, which, commonly known as fairness and jitter, are unified in the same model. Second, a smoothed multiplexer sMUX is designed that allocates bandwidth among competing flows sharing a link and guarantees almost ideal smoothness for each flow. Third, the buffered crossbar sBUX is designed that uses the scheduler sMUX at each input and output, and a two-cell buffer at each crosspoint. It is proved that sBUX guarantees 100% throughput for real-time services and almost ideal smoothness for each flow. Fourth, an on-line bandwidth regulator is designed that periodically estimates bandwidth demand and generates admissible allocations, which enables sBUX to support best-effort services. Simulation shows almost 100% throughput and multi-microsecond average delay. In particular, neither credit-based flow control or speed-up is used, and arbitrary fabric-internal latency is allowed between line cards and the switch core, simplifying the switch implementation.  相似文献   

18.
Ghosh  D. Daly  J.C. 《Electronics letters》1992,28(10):902-903
A selfrouting crossbar switch with multiple channels per input and output ports has been designed in 2 mu m CMOS. It has a pipelined architecture which permits high speed path setup and arbitration. This crossbar is the building block for an asynchronous transfer mode (ATM) switch fabric using multiple channel delta networks with shared output buffers.<>  相似文献   

19.
We consider a problem motivated by the desire to provide flexible, rate-based, quality of service guarantees for packets sent over input queued switches and switch networks. Our focus is solving a type of online traffic scheduling problem, whose input at each time step is a set of desired traffic rates through the switch network. These traffic rates in general cannot be exactly achieved since they assume arbitrarily small fractions of packets can be transmitted at each time step. The goal of the traffic scheduling problem is to closely approximate the given sequence of traffic rates by a sequence of transmissions in which only whole packets are sent. We prove worst-case bounds on the additional buffer use, which we call backlog, that results from using such an approximation. We first consider the NtimesN, input queued, crossbar switch. Our main result is an online packet-scheduling algorithm using no speedup that guarantees backlog at most (N+1)2 /4 packets at each input port and each output port. Upper bounds on worst-case backlog have been proved for the case of constant fluid schedules, such as the N2-2N+2 bound of Chang, Chen, and Huang (INFOCOM, 2000). Our main result for the crossbar switch is the first, to our knowledge, to bound backlog in terms of switch size N for arbitrary, time-varying fluid schedules, without using speedup. Our main result for Banyan networks is an exact characterization of the speedup required to maintain bounded backlog, in terms of polytopes derived from the network topology  相似文献   

20.
孙志刚  卢锡城 《电子学报》2000,28(Z1):133-134,137
由于受到存储器带宽的限制,目前宽带路由器一般采用输入缓冲的crossbar交换开关.支持带宽预约的开关调度算法对保证路由器的服务质量(QoS-Qualityof Service)十分重要.本文介绍一种支持带宽预约的crossbar交换开关调度算法——CISP(Configurable Input Serial Polling).该算法不但支持确保服务,而且硬件实现简单.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号