首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A major issue in router design for the next generation Internet is the fast IP address lookup mechanism. The existing scheme by Huang et al. (see Proc. IEEE INFOCOM'99, New York, NY, 1999) performs the IP address lookup in hardware in which the forwarding table can be compressed to fit into a reasonable-size SRAM, and a lookup can be accomplished in three memory accesses. We claim that with a little extra memory, it is able to further reduce the lookup time to two memory accesses  相似文献   

3.
Using ternary content addressable memory (TCAM) for high-speed IP address lookup has been gaining popularity due to its deterministic high performance. However, restricted by the slow improvement of memory accessing speed, the route lookup engines for next-generation terabit routers demand exploiting parallelism among multiple TCAM chips. Traditional parallel methods always incur excessive redundancy and high power consumption. We propose in this paper an original TCAM-based IP lookup scheme that achieves both ultra-high lookup throughput and optimal utilization of the memory while being power-efficient. In our multi-chip scheme, we devise a load-balanced TCAM table construction algorithm together with an adaptive load balancing mechanism. The power efficiency is well controlled by decreasing the number of TCAM entries triggered in each lookup operation. Using four 133 MHz TCAM chips and given 25% more TCAM entries than the original route table, the proposed scheme achieves a lookup throughput of up to 533 MPPS while remains simple for ASIC implementation.  相似文献   

4.
The translation lookaside buffer (TLB) is an essential component used to speed up the virtual-to-physical address translation. Due to frequent lookup, however, the power consumption of the TLB is usually considerable. This paper presents an energy-efficient TLB design for the embedded processors. In our design, we first propose a real-time filter scheme to facilitate the block buffering to eliminate the redundant TLB accesses without comparator delay. By modifying the address registers to be sensitive to the contents variation, the proposed real-time filter can distinguish the redundant TLB access as soon as the virtual address is generated. The second technique is a banking-like design, which aims to reduce the energy consumption per TLB access in case of block buffer miss. To alleviate the performance penalty introduced by the conventional banking technique, we develop two adaptive variants of the banked TLB. Both variants can achieve the high energy efficiency as the banked TLB while maintaining the low miss ratio as the nonbanked TLB. The experimental results show that by filtering out all the redundant TLB accesses and then minimizing the energy consumption per access, without any performance penalty our design can effectively improve the Energy* Delay product of the TLB, especially for the data TLB with poor locality  相似文献   

5.
Huffman coding is a popular and important lossless compression scheme for various multimedia applications. This paper presents a low-latency parallel Huffman decoding technique with efficient memory usage for multimedia standards. First, the multi-layer prefix grouping technique is proposed for sub-group partition. It exploits the prefix characteristic in Huffman codewords to solve the problem of table size explosion. Second, a two-level table lookup approach is introduced which can promptly branch to the correct sub-group by level-1 table lookup and decode the symbols by level-2 table lookup. Third, two optimization approaches are developed; one is to reduce the branch cycles and the other is parallel processing between two-level table lookup and direct table lookup approaches to fully utilize the advantage of VLIW parallel processing. An AAC Huffman decoding example is realized on the Parallel Architecture Core DSP (PAC DSP) processor. The simulation results show that the proposed method can further improve about 89% of decoding cycles and 33% of table size comparing to the linear search method.
Chun-Nan LiuEmail:
  相似文献   

6.
In OpenFlow networks,switches accept flow rules through standardized interfaces,and perform flow-based packet processing.To facilitate the lookup of flow tables,TCAM has been widely used in OpenFlow switches.However,TCAM is expensive and consumes a large amount of power.A hybrid lookup scheme integrating multiple-cell Hash table with TCAM was proposed for flow table matching to simultaneously reduce the cost and power consumption of lookup structure without sacrificing the lookup performance.By theoretical analysis and extensive experiments,optimal capacity configuration of Hash table and TCAM was achieved with the optimized cost of flow table lookup.The experiment results also show that the proposed lookup scheme can save over 90% cost and the power consumption of flow table matching can be reduced significantly compared with the pure TCAM scheme while keeping the similar lookup performance.  相似文献   

7.
周舟  付文亮  嵩天  刘庆云 《电子学报》2015,43(9):1833-1840
URL查找是众多网络系统中重要的组成部分,如URL过滤系统、Web缓存等.随着互联网的迅速发展,URL查找面临的主要挑战是实现大规模URL集合下的高速查找,同时保证低存储和低功耗.本文提出了一种基于并行Bloom Filter的URL查找算法,CaBF.该算法高度并行化,提供大规模URL集合下的高速最长前缀匹配,并很好地适应集合中不同数量的URL组件.理论分析和真实网络数据集上的实验表明,该算法相比现有算法可以降低假阳性概率达一个数量级(或者在满足相同假阳性概率的前提下降低存储和硬件逻辑资源消耗).此外,该方法的体系结构很容易映射到FPGA等硬件器件上,提供每秒超过150M次的URL查找速度.  相似文献   

8.
One of the pertinent design issues for new generation IP routers is the route-lookup mechanism. For each incoming IP packet, the IP routing is required to perform a longest-prefix matching on the route lookup in order to determine the packet's next hop. This study presents a fast unicast route-lookup mechanism that only needs tiny SRAM and can be implemented using a hardware pipeline. The forwarding table, based on the proposed scheme, is small enough to fit into a faster SRAM with low cost. For example, a large routing table with 40000 routing entries can be compacted into a forwarding table of 450-470 kbytes costing less than US$30. Most route lookups need only one memory access; no lookup needs more than three memory accesses. When implemented using a hardware pipeline, the proposed mechanism can achieve one routing lookup every memory access. With current 10-ns SRAMs, this mechanism furnishes approximately 100×106 routing lookups/s, which is much faster than any current commercially available routing-lookup scheme  相似文献   

9.
Memory/speed tradeoffs are presented for the implementation of systematic linear block code decoders using lookup tables. Specifically, the authors show how to reduce the size of the lookup table that converts the syndrome into the error pattern. On the basis of such a single-error-correcting decoder they propose a t-stage decoder that can correct up to t errors. It is shown that the proposed scheme actually constitutes a general method for compressing lookup tables  相似文献   

10.
基于非重叠前缀集合的并行路由查找系统   总被引:1,自引:0,他引:1       下载免费PDF全文
梁志勇  徐恪  吴建平  柴云鹏 《电子学报》2004,32(8):1277-1281
快速的路由查找机制是高性能路由器设计的关键.最长匹配查找是路由查找的难点所在.本文提出一个并行路由查找系统.它使用一种路由表划分方法,可将路由表中的前缀划分为若干个集合,集合内前缀没有重叠.从而把路由表前缀的最长匹配查找转化为若干个集合内前缀的唯一匹配查找.基于这种方法,本文还提出一个通用的并行路由查找框架,框架适用于大多数路由查找算法.并行查找框架可简化查找算法的设计,提高查找算法的速度.使用二分查找算法,并行查找系统可以达到log2(2N/B)的查找复杂度 (N为路由表前缀数目,B为大于4的整数).同时,并行查找系统对IPv6也具有很好的扩展性.  相似文献   

11.
Internet protocol (IP) address lookup is one of the major performance bottlenecks in high-end routers. This paper presents an architecture for an IP address lookup engine based on programmable finite-state machines (FSMs). The IP address lookup problem can be translated into the implementation of a large FSM. Our hardware engine is then used to implement this FSM using a structured approach, in which the large FSM is broken down into a set of smaller FSMs which are then mapped into reconfigurable hardware blocks. The design of our hardware engine is based on a regular and well structured architecture, which is easy to scale. Our simulation results demonstrate that the FSM based architecture can easily scale to wire speed performance at OC-192 rates. Unlike previous approaches, the performance of our architecture is not constrained by memory bandwidth and is, therefore, in principle scalable with very large scale integration technology.  相似文献   

12.
We suggest a new simple forwarding technique to speed up IP destination address lookup. The technique is a natural extension of IP, requires 5 bits in the IP header (IPv4, 7 in IPv6), and performs IP lookup nearly as fast as IP/Tag switching but with a smaller memory requirement and a much simpler protocol. The basic idea is that each router adds a "clue" to each packet, telling its downstream router where it ended the IP lookup. Since the forwarding tables of neighboring routers are similar, the clue either directly determines the best prefix match for the downstream router, or provides the downstream router with a good point to start its IP lookup. The new scheme thus prevents repeated computations and distributes the lookup process across the routers along the packet path. Each router starts the lookup computation at the point its upstream neighbor has finished. Furthermore, the new scheme is easily assimilated into heterogeneous IP networks, does not require routers coordination, and requires no setup time. Even a flow of one packet enjoys the benefits of the scheme without any additional overhead. The speedup we achieve is about 10 times faster than current standard techniques. In a sense, this paper shows that the current routers employed in the Internet are clue-less; namely, it is possible to speed up the IP lookup by an order of magnitude without any major changes to the existing protocols  相似文献   

13.
一种适合硬件实现的高效算术编码   总被引:3,自引:0,他引:3  
本文提出了利用表查找来实现二值算术编码,避免乘除法运算,可以简化硬件设计。该算法具有较小的概率逼近误差,因此性能退化较小。  相似文献   

14.
The use of deep-submicrometer (DSM) technology increases the capacitive coupling between adjacent wires leading to severe crosstalk noise, which causes power dissipation and may also lead to malfunction of a chip. In this paper, we present a technique that reduces crosstalk noise on instruction buses. While previous research focuses primarily on address buses, little work can be applied efficiently to instruction buses. This is due to the complex transition behavior of instruction streams. Based on instruction sequence profiling, we exploit an architecture that encodes pairs of bus wires and permute them in order to optimize power and noise. A close to optimal architecture configuration is obtained using a genetic algorithm. Unlike previous bus encoding approaches, crosstalk reduction can be balanced with delay and area overhead. Moreover, if delay (or area) is most critical, our architecture can be tailored to add nearly no overhead to the design. For our experiments, we used instruction bus traces obtained from 12 SPEC2000 benchmark programs. The results show that our approach can reduce crosstalk up to 50.79% and power consumption up to 55% on instruction buses.  相似文献   

15.
On fast address-lookup algorithms   总被引:17,自引:0,他引:17  
The growth of the Internet and its acceptance has sparkled keen interest in the research community in respect to many apparent scaling problems for a large infrastructure based on IP technology. A self-contained problem of considerable practical and theoretical interest is the longest-prefix lookup operation, perceived as one of the decisive bottlenecks. Several novel approaches have been proposed to speed up this operation that promise to scale forwarding technology into gigabit speeds. This paper surveys these new lookup algorithms and classifies them based on applied techniques, accompanied by a set of practical requirements that are critical to the design of high-speed routing devices. We also propose several new algorithms to provide lookup capability at gigabit speeds. In particular, we show the theoretical limitations of routing table size and show that one of our new algorithms is almost optimal, while requiring only a small number of memory accesses to perform each address lookup  相似文献   

16.
A Low Complexity Decoding Algorithm for Extended Turbo Product Codes   总被引:1,自引:0,他引:1  
In this letter, we propose a low complexity algorithm for extended turbo product codes by considering both the encoding and decoding aspects. For the encoding part, a new encoding scheme is presented for which the operations of looking up and fetching error patterns are no longer necessary, and thus the lookup table can be omitted. For the decoder, a new algorithm is proposed to extract the extrinsic information and reduce the redundancy. This new algorithm can reduce decoding complexity greatly and enhance the performance of the decoder. Simulation results are presented to show the effectiveness of the proposed scheme.  相似文献   

17.
基于FPGA的DDS设计及实现   总被引:1,自引:0,他引:1  
针对DDS频率转换时间短,分辨率高等优点,提出了基于FPGA芯片设计DDS系统的方案。该方案利用Altera公司的QuartusⅡ开发软件,完成DDS核心部分即相位累加器和ROM查找表的设计,可得到相位连续、频率可变的信号,并通过单片机配置FPGA的E^2 PROM完成对DDS硬件的下栽,最后完成每个模块与系统的时序仿真。经过电路设计和模块仿真,验证了设计的正确性。由于FPGA的可编程性,使得修改和优化DDS的功能非常快捷。  相似文献   

18.
High speed IP address lookup architecture using hashing   总被引:1,自引:0,他引:1  
One of the most important design issues for IP routers responsible for datagram forwarding in computer networks is the route-lookup mechanism. In this letter, we explore a practical IP address lookup scheme which converts the longest prefix matching problem into the exact matching problem. In the proposed architecture, the forwarding table is composed of multiple SRAM, and each SRAM represents an address lookup table in a single prefix. Hashing functions are applied to each address lookup table in order to find out matching entries in parallel, and the entry matched with the longest prefix among them is selected. Simulation using data from the MAE-WEST router shows that a large routing table with 37000 entries is compacted to a forwarding table of 189 kbytes in the proposed scheme and achieves one route lookup every two memory accesses in average.  相似文献   

19.
This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations are used to implement datapaths. In order to facilitate comparison with existing FPGA devices, the virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools. This methodology involves adopting existing FPGA resources to model the size, position, and delay of the embedded elements. The standard design flow offered by FPGA and computer-aided design vendors is then applied and static timing analysis can be used to estimate the performance of the FPGA with the embedded blocks. On selected floating-point benchmark circuits, our results indicate that the proposed architecture can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device.   相似文献   

20.
田园  王萌  缪建军  刘葳 《电子质量》2012,(4):43-44,54
星上路由器是构建下一代全球信息网络的重要节点设备,路由查找是影响数据转发性能的关键技术之一。考虑到空间环境对设备在可靠性、重量和功耗等因素的限制,该文通过分析比较多种路由查找算法,根据软硬件协同设计的思想,给出了适合星上硬件路由查找的设计与分析。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号