首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 1 毫秒
1.
An analytical model for the performance analysis of a multiple input queued asynchronous transfer mode (ATM) switch is presented. The interconnection network of the ATM switch is internally nonblocking and each input port maintains a separate queue of cells for each output port. The switch uses parallel iterative matching (PIM) to find the maximal matching between the input and output ports of the switch. A closed-form solution for the maximum throughput of the switch under saturated conditions is derived. It is found that the maximum throughput of the switch exceeds 99% with just four iterations of the PIM algorithm. Using the tagged input queue approach, an analytical model for evaluating the switch performance under an independent identically distributed Bernoulli traffic with the cell destinations uniformly distributed over all output ports is developed. The switch throughput, mean cell delay, and cell loss probability are computed from the analytical model. The accuracy of the analytical model is verified using simulation  相似文献   

2.
This letter proposes an innovative pipeline-based maximal-sized matching scheduling approach, called PMM, for input-buffered switches. It dramatically relaxes the timing constraint for arbitration with a maximal matching scheme. In the PMM approach, arbitration operates in a pipelined manner. Each subscheduler is allowed to take more than one time slot for its matching. Every time slot, one of them provides the matching result. The subscheduler can adopt a pre-existing efficient round-robin-based maximal matching algorithm. We show that PMM provides 100% throughput under uniform traffic since it preserves a desynchronization effect of the round-robin pointers as in the preexisting algorithm. In addition, PMM maintains fairness for best-effort traffic due to the round-robin-based arbitration  相似文献   

3.
Input-buffered switches have been widely considered for implementing feasible packet switches. However, their matching process may not be time-efficient for switches with high-speed ports. Buffered crossbars (BXs) are an alternative to relax timing for packet switches with high-speed ports and to provide high-performance switching. BX switches were originally considered expensive, as the memory amount required in the crosspoints (XPs) is proportional to the square of the number of ports (O(N/sup 2/)). This limitation is now less stringent with the advances on chip-fabrication techniques, and when considering small crosspoint (XP) buffer sizes. In this paper, we study a combined input-crosspoint buffered packet switch, named CIXB, with virtual output queues (VOQs) at the inputs, and arbitration based on round-robin selection. We show that the CIXB switch achieves 100% throughput under uniform traffic, and high performance under nonuniform traffic, using one-cell XP buffer size and no speedup.  相似文献   

4.
PetaStar: a petabit photonic packet switch   总被引:6,自引:0,他引:6  
This paper presents a new petabit photonic packet switch architecture, called PetaStar. Using a new multidimensional photonic multiplexing scheme that includes space, time, wavelength, and subcarrier domains, PetaStar is based on a three-stage Clos-network photonic switch fabric to provide scalable large-dimension switch interconnections with nanosecond reconfiguration speed. Packet buffering is implemented electronically at the input and output port controllers, allowing the central photonic switch fabric to transport high-speed optical signals without electrical-to-optical conversion. Optical time-division multiplexing technology further scales port speed beyond electronic speed up to 160 Gb/s to minimize the fiber connections. To solve output port contention and internal blocking in the three-stage Clos-network switch, we present a new matching scheme, called c-MAC, a concurrent matching algorithm for Clos-network switches. It is highly distributed such that the input-output matching and routing-path finding are concurrently performed by scheduling modules. One feasible architecture for the c-MAC scheme, where a crosspoint switch is used to provide the interconnections between the arbitration modules, is also proposed. With the c-MAC scheme, and an internal speedup of 1.5, PetaStar with a switch size of 6400 /spl times/ 6400 and total capacity of 1.024 petabit/s can be achieved at a throughput close to 100% under various traffic conditions.  相似文献   

5.
We establish some lower bounds on the speedup required to achieve throughput for some classes of switching algorithms in a input-queued switch with virtual output queues (VOQs). We use a weak notion of throughput, which will only strengthen the results, since an algorithm that cannot achieve weak throughput cannot achieve stronger notions of throughput. We focus on priority switching algorithms, i.e., algorithms that assign priorities to VOQs and forward packets of high priority first. We show a lower bound on the speedup for two fairly general classes of priority switching algorithms: input priority switching algorithms and output priority switching algorithms. An input priority scheme prioritizes the VOQs based on the state of the input queues, while an output priority scheme prioritizes the VOQs based on their output ports. We first show that, for output priority switching algorithms, a speedup S/spl ges/2 is required to achieve weak throughput. From this, we deduce that both maximal and maximum size matching switching algorithms do not imply weak throughput unless S/spl ges/2. The bound of S/spl ges/2 is tight in all cases above, based on a result in Dai et al. Finally, we show that a speedup S/spl ges/3/2 is required for the class of input priority switching algorithms to achieve weak throughput.  相似文献   

6.
As an alternative to input-buffered switches, combined input-crosspoint buffered switches relax arbitration timing and provide high-performance switching for packet switches with high-speed ports. It has been shown that these switches, with one-cell crosspoint buffer and round-robin (RR) arbitration at input and output ports, provide 100% throughput under uniform traffic. However, under admissible traffic patterns with nonuniform distributions, only weight-based selection schemes are reported to provide high throughput. We propose an RR based arbitration scheme for a combined input-crosspoint buffered packet switch that provides nearly 100% throughput for several admissible traffic patterns, including uniform and unbalanced traffic, using one-cell crosspoint buffers. The presented scheme uses adaptable-size frames, so that the frame size adapts to the traffic pattern.  相似文献   

7.
徐宁  余少华  汪学舜 《电子学报》2012,40(12):2360-2366
针对混合输入-交叉点队列(CICQ)交换结构受限于"流控通信延时"、"需要2倍内部加速仿真输出队列(OQ)交换"以及单纯交叉点缓冲(CQ)存在"非均衡流量模式下吞吐量性能不足"等问题,本文提出一种新型的"负载均衡交叉点缓冲交换结构".采用固定模式时隙轮转匹配进行负载均衡处理,将到达输入端口的非均衡流量转化为近似均衡流量并且平均分配到同一输出端口对应的交叉缓冲中,从而可以利用较小的交叉点缓冲来模拟输出队列调度,简化调度过程并且提高吞吐量.理论分析证明了这种新结构的稳定性以及模拟输出队列交换的能力.同时仿真表明,采用该交换结构可以在不需要内部加速的条件下获得相当于输出队列交换的性能,并且有效地解决了交叉点缓冲队列非均衡流量性能不足的问题.  相似文献   

8.
A general model is presented to study the performance of a family of space-domain packet switches, implementing both input and output queuing and varying degrees of speedup. Based on this model, the impact of the speedup factor on the switch performance is analyzed. In particular, the maximum switch throughput, and the average system delay for any given degree of speedup are obtained. The results demonstrate that the switch can achieve 99% throughput with a modest speedup factor of four. Packet blocking probability for systems with finite buffers can also be derived from this model, and the impact of buffer allocation on blocking probability is investigated. Given a fixed buffer budget, this analysis obtains an optimal placement of buffers among input and output ports to minimize the blocking probability. The model is also extended to cover a nonhomogeneous system, where traffic intensity at each input varies and destination distribution is not uniform. Using this model, the effect of traffic imbalance on the maximum switch throughput is studied. It is seen that input imbalance has a more adverse effect on throughput than output imbalance  相似文献   

9.
Current MSM switching fabric has poor performance under unbalanced traffic. This paper presents an alternative, novel Central-stage Buffered Three-stage Clos switching (CB-3Clos) fabric and proves that this fabric can emulate output queuing switch without any speedup. By analyzing the condition to satisfy the central-stage load-balance, this paper also proposes a Central-stage Load-balanced-based Distributed Scheduling algorithm (CLDS) for CB-3Clos. The results show that, compared with Concurrent Round-Robin based Dispatching (CRRD) algorithm based on MSM, CLDS algorithm has high throughput irrespective with the traffic model and better performance in mean packet delay.  相似文献   

10.
We consider a problem motivated by the desire to provide flexible, rate-based, quality of service guarantees for packets sent over input queued switches and switch networks. Our focus is solving a type of online traffic scheduling problem, whose input at each time step is a set of desired traffic rates through the switch network. These traffic rates in general cannot be exactly achieved since they assume arbitrarily small fractions of packets can be transmitted at each time step. The goal of the traffic scheduling problem is to closely approximate the given sequence of traffic rates by a sequence of transmissions in which only whole packets are sent. We prove worst-case bounds on the additional buffer use, which we call backlog, that results from using such an approximation. We first consider the NtimesN, input queued, crossbar switch. Our main result is an online packet-scheduling algorithm using no speedup that guarantees backlog at most (N+1)2 /4 packets at each input port and each output port. Upper bounds on worst-case backlog have been proved for the case of constant fluid schedules, such as the N2-2N+2 bound of Chang, Chen, and Huang (INFOCOM, 2000). Our main result for the crossbar switch is the first, to our knowledge, to bound backlog in terms of switch size N for arbitrary, time-varying fluid schedules, without using speedup. Our main result for Banyan networks is an exact characterization of the speedup required to maintain bounded backlog, in terms of polytopes derived from the network topology  相似文献   

11.
Multistage interconnection networks (MINs) have long been studied for use in switching networks. Since they have a unique path between source and destination and the intermediate nodes of the paths are shared, internal blocking can cause very poor throughput. This paper proposes a high throughput ATM switch consisting of an Omega network with a new form of input queues called bypass queues. We also improve the switch throughput by partitioning the Input buffers into disjoint buffer sets and multiplexing several sets of nonblocking cells within a time slot, assuming that the routing switch operates only a couple of times faster than the transmission rate. A neural network model is presented as a controller for cell scheduling and multiplexing in the switch. Our simulation results under uniform traffic show that the proposed approach achieves almost 100% of potential switch throughput  相似文献   

12.
The paper studies input-queued packet switches loaded with both unicast and multicast traffic. The packet switch architecture is assumed to comprise a switching fabric with multicast (and broadcast) capabilities, operating in a synchronous slotted fashion. Fixed-size data units, called cells, are transferred from each switch input to any set of outputs in one time slot, according to the decisions of the switch scheduler, that identifies at each time slot a set of nonconflicting cells, i.e., cells neither coming from the same input, nor directed to the same output. First, multicast traffic admissibility conditions are discussed, and a simple counterexample is presented, showing intrinsic performance losses of input-queued with respect to output-queued switch architectures. Second, the optimal scheduling discipline to transfer multicast packets from inputs to outputs is defined. This discipline is rather complex, requires a queuing architecture that probably is not implementable, and does not guarantee in-sequence delivery of data. However, from the definition of the optimal multicast scheduling discipline, the formal characterization of the sustainable multicast traffic region naturally follows. Then, several theorems showing intrinsic performance losses of input-queued with respect to output-queued switch architectures are proved. In particular, we prove that, when using per multicast flow FIFO queueing architectures, the internal speedup that guarantees 100% throughput under admissible traffic grows with the number of switch ports.  相似文献   

13.
On the speedup required for work-conserving crossbar switches   总被引:5,自引:0,他引:5  
This paper describes the architecture for a work-conserving server using a combined I/O-buffered crossbar switch. The switch employs a novel algorithm based on output occupancy, the lowest occupancy output first algorithm (LOOFA), and a speedup of only two. A work-conserving switch provides the same throughput performance as an output-buffered switch. The work-conserving property of the switch is independent of the switch size and input traffic pattern. We also present a suite of algorithms that can be used in combination with LOOFA. These algorithms determine the fairness and delay properties of the switch. We also describe a mechanism to provide delay bounds for real-time traffic using LOOFA. These delay bounds are achievable without requiring output-buffered switch emulation  相似文献   

14.
在路由器或交换机的交换结构中实现组播是提高组播应用速度的重要途径之一。传统的交叉开关结构(crossbar)组播调度方案有两种缺陷,一种是性能较低,另一种是实现的复杂度太高,无法满足高速交换的需要。该文提出了一个新的基于交叉开关的两级组播交换结构(TSMS),第1级是组播到单播的交换结构,第2级是联合输入和输出排队(CIOQ)交换,并为该结构设计了合适的最大扇出排队(FCN)优先-均匀分配中间缓存调度算法(LFCNF-UMBA)。理论分析和仿真实验都显示在该结构中,加速比低于22/(N+1)倍时吞吐率不可能实现100%;而采用LFCNF-UMBA调度算法,2倍加速比就可保证在任意允许(admissible)组播的吞吐率达到100%。  相似文献   

15.
LDPC编码调制系统中基于反馈LLR均值的迭代解调/译码算法   总被引:1,自引:0,他引:1  
该文针对LDPC码编码的BICM系统,提出一种对LDPC码译码器输出外附信息的计算方法进行改进的迭代解调/译码算法。与传统的解调/译码算法不同在于,该算法对每次BP迭代中译码器输出的各编码比特的外附LLR分别求均值后,再将其作为先验信息反馈给软解调器开始下次的迭代解调/译码。采用该方法可有效地减轻LDPC码在BP迭代过程中某些比特LLR值的振荡现象,从而使得传递给软解调器的外附信息更准确。仿真结果表明,和传统的两种迭代解调/译码算法相比,该算法能进一步提高LDPC编码BICM迭代系统的译码性能,而复杂度并无明显增加。  相似文献   

16.
The load balanced Birkhoff–von Neumann switch is an elegant VOQ architecture with two outstanding characteristics: (i) it has a computational cost of O(1) iterations and (ii) input controllers do not exchange information (as a result, it allows decoupled implementations with a low power density). The load balancing stage guarantees stability under a broad class of traffic patterns. It may alter packet sequence, but this can be solved with appropriate packet selection strategies. The average packet delay caused by previous maximal size matching algorithms, such as iSLIP, RDSRR, or PHM is noticeably lower than that of a Birkhoff–von Neumann switch, especially for low and medium loads. However, they need tightly coupled VOQ controllers, which implies higher power density. For example, this makes difficult to apply those algorithms to optical switching architectures. Moreover, they require O(log2 N) iterations to converge, and this computational cost may be unacceptable for the slot lengths in optical packet switches. In this paper, we propose a family of decoupled Parallel Hierarchical Matching (PHM) VOQ controllers (DPHM). They outperform the Birkhoff–von Neumann scheduler, which can be viewed as a member of the family (in fact, the simplest one). DPHM schedulers have a computational cost of O(1) iterations and, unlike last generation maximal size matching algorithms, they allow a low input controller interconnection complexity (low power density switch implementation). Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

17.
Saturn: a terabit packet switch using dual round robin   总被引:8,自引:0,他引:8  
Large input-output buffering with a moderate speedup has been widely considered as the most feasible solution for large-capacity switches. We propose a new terabit per second packet switch and call it the Saturn switch. It uses a simple dual round-robin arbitration scheme to schedule packets, and achieves high throughput and low statistical delay bound. It employs a bit-sliced crossbar fabric to switch packets at 10 Gb/s at inputs and outputs, and adopts a novel token-tunneling technique to arbitrate contending packets at high speed (e.g., within 10 ns), thus achieving a switch capacity of more than 1 Tb/s with existing electronic technology.  相似文献   

18.
A switching network that approaches a maximum throughput of 100% as buffering is increased is proposed. This self-routing switching network consists of simple 2×2 switching elements, distributors, and buffers located between stages and in the output ports. The proposed switching requires a speedup factor of two. The structure and the operation of the switching network are described, and its performance is analyzed. The switch has log2N stages that move packets in a store-and-forward fashion, incurring a latency of log2 N time periods. The performance analysis of the switch under uniform traffic pattern shows that the additional delay is small, and a maximum throughput of 100% is achieved as buffering is increased  相似文献   

19.
We have previously proposed an efficient switch architecture called multiple input/output-queued (MIOQ) switch and showed that the MIOQ switch can match the performance of an output-queued switch statistically. In this paper, we prove theoretically that the MIOQ switch can match the output queueing exactly , not statistically, with no speedup of any component. More specifically, we show that the MIOQ switch with two parallel switches (which we call a parallel MIOQ (PMIOQ) switch in this paper) can provide exact emulation of an output-queued switch with a broad class of service scheduling algorithms including FIFO, weighted fair queueing (WFQ) and strict priority queueing regardless of incoming traffic pattern and switch size. To do that, we first propose the stable strategic alliance (SSA) algorithm that can produce a stable many-to-many assignment, and prove its finite, stable and deterministic properties. Next, we apply the SSA algorithm to the scheduling of a PMIOQ switch with two parallel switches, and show that the stability condition of the SSA algorithm guarantees for the PMIOQ switch to emulate an output-queued switch exactly. To avoid possible conflicts in a parallel switch, each input-output pair matched by the SSA algorithm must be mapped to one of two crossbar switches. For this mapping, we also propose a simple algorithm that requires at most 2N steps for all matched input-output pairs. In addition, to relieve the implementation burden of N input buffers being accessed simultaneously, we propose a buffering scheme called redundant buffering which requires two memory devices instead of N physically-separate memories. In conclusion, we demonstrate that the MIOQ switch requires two crossbar switches in parallel and two physical memories at each input and output to emulate an output-queued switch with no speedup of any component.  相似文献   

20.
This paper presents and evaluates a quasi-optimal scheduling algorithm for input buffered cell-based switches, named reservation with preemption and acknowledgment (RPA). RPA is based on reservation rounds where the switch input ports indicate their most urgent data transfer needs, possibly overwriting less urgent requests by other input ports, and an acknowledgment round to allow input ports to determine what data they can actually transfer toward the desired switch output port. RPA must be executed during every cell time to determine which cells can be transferred during the following cell time. RPA is shown to be as simple as the simplest proposals of input queuing scheduling, efficient in the sense that no admissible traffic pattern was found under which RPA shows throughput limitations, and flexible, allowing the support of packet-mode operations and different traffic classes with either strict priority discipline or bandwidth guarantee requirements. The effectiveness of RPA is assessed with detailed simulations in uniform as well as unbalanced traffic conditions and its performance is compared with output queuing switches and the optimal maximum weighted matching (MWM) algorithm for input-buffered switches. A bound on the performance difference between the heuristic weight matching adopted in RPA and MWM is analytically computed  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号