期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

OpenFlow table lookup scheme integrating multiple-cell Hash table with TCAM

Chun-qiang LI Yong-qiang DONG Guo-xin WU 《通信学报》2016,37(10):128-140

In OpenFlow networks,switches accept flow rules through standardized interfaces,and perform flow-based packet processing.To facilitate the lookup of flow tables,TCAM has been widely used in OpenFlow switches.However,TCAM is expensive and consumes a large amount of power.A hybrid lookup scheme integrating multiple-cell Hash table with TCAM was proposed for flow table matching to simultaneously reduce the cost and power consumption of lookup structure without sacrificing the lookup performance.By theoretical analysis and extensive experiments,optimal capacity configuration of Hash table and TCAM was achieved with the optimized cost of flow table lookup.The experiment results also show that the proposed lookup scheme can save over 90% cost and the power consumption of flow table matching can be reduced significantly compared with the pure TCAM scheme while keeping the similar lookup performance. 相似文献

2.

A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme

Arsovski I. Chandler T. Sheikholeslami A. 《Solid-State Circuits, IEEE Journal of》2003,38(1):155-158

A 256/spl times/144-bit TCAM is designed in 0.18-/spl mu/m CMOS. The proposed TCAM cell uses 4T static storage for increased density. The proposed match-line (ML) sense scheme reduces power consumption by minimizing switching activity of search-lines and limiting voltage swing of MLs. The scheme achieves a match-time of 3 ns and operates at a minimum supply voltage of 1.2 V. 相似文献

3.

Ternary CAM Power and Delay Model: Extensions and Uses

Agrawal B. Sherwood T. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):554-564

Applications in computer networks often require high throughput access to large data structures for lookup and classification. While advanced algorithms exist to speed these search primitives on network processors and even custom application-specific integrated circuits (ASICs), achieving tight bounds on worst case performance with standard memories often requires a very careful analysis of all possible access patterns. An alternative, and often times more simple solution, is possible if a ternary CAM (TCAM) is used to perform a fully parallel search across the entire data set. Unfortunately, this parallelism means that large portions of the chip are switching during each cycle, causing large amounts of power to be consumed. While researchers at all levels of design (from algorithms to circuits) have begun to explore new ways of managing the power consumption, quantifying design alternatives is difficult due to a lack of available models. In this paper, we examine the structure of a modern TCAM and present a simple, yet accurate, power and delay model. We present techniques to estimate the dynamic power consumption and leakage power of a TCAM structure and validate the model using a combination of industrial TCAM datasheets and prior published works. Such a model is a critical first step in bridging the intellectual divide between circuit-level and algorithm-level optimizations. To demonstrate the utility of our model, we present an extensive analysis of the model by varying various architectural parameters and describe how our model can be easily extended to handle several circuit optimizations in the TCAM structure. In addition, we present a comparative study of SRAM and TCAM energy consumption to directly quantify the many design options which will be very useful for network designers to explore various power management schemes. 相似文献

4.

On a trie partitioning algorithm for power-efficient TCAMs

Haibin Lu 《International Journal of Communication Systems》2008,21(2):115-133

Internet routers conduct routing table (RT) lookup based on the destination IP address of the incoming packet to decide which output port to forward the packet. Ternary content-addressible memories (TCAM) uses parallelism to achieve lookup in a single cycle. One of the major drawbacks of TCAM is its high-power consumption. Trie-based architecture has been proposed to reduce TCAM power consumption. The idea is to use an index TCAM to select one of many data TCAM blocks for lookup. However, power reduction is limited by the size of the index TCAM, which is always enabled for search. In this paper we develop a simple but effective trie-partitioning algorithm to reduce the index TCAM size, which achieves better reduction in power consumption, and at the same time guarantees full TCAM space utilization. We compared our algorithm (LogSplit) with PostOrderSplit (IEEE INFOCOM, 2003). For two real-world RTs (AADS and PAIX), the size of the index TCAM generated by LogSplit is 55–70% of that generated by PostOrderSplit; the largest power reduction factor of LogSplit is 41 for AADS and 68 for PAIX, while the largest power reduction factor of PostOrderSplit is 33 for AADS and 52 for PAIX. The improvement is even more significant in the worst case: the size of the index TCAM generated by LogSplit is 18–30% of that generated by PostOrderSplit for IPv4, and less than 1% of that generated by PostOrderSplit for IPv6; the largest power reduction factor of LogSplit is 173 for both IPv4 and IPv6, while the largest power reduction factor of PostOrderSplit is only 82 for IPv4 and 41 for IPv6. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

5.

Activity correlation-based clustering clock-gating technique for digital filters

Q. Tong 《International Journal of Electronics》2013,100(7):1095-1106

Clock gating is an effective way to reduce the dynamic power in digital sequential circuits. In this paper, a gate-level activity correlation-based clustering clock-gating (CCG) technique is proposed for digital filters. The CCG technique exploits the correlations between flip-flops, and determines how to group the flip-flops for clock gating. An Activity Correlation Matrix (ACMtx) is introduced to describe the correlations between the flip-flops, and a greedy clustering algorithm is proposed to find an optimised clustering scheme as well. Experiments on ISCAS’89 benchmarks show that the proposed technique can reduce power consumption by 5.08% on average, on top of existing technique. For the circuits with large numbers of flip-flops, our proposed technique can save 15.84% more power on average. 相似文献

6.

基于TCAM的低能耗正则表达式匹配算法

丁麟轩黄昆张大方《通信学报》2014,35(8):20-168

提出一种基于字符索引的正则表达式匹配算法,对确定型有限自动机（DFA, deterministic finite automaton）的字母表和状态进行分离存储,构建字符索引,减少匹配时激活的TCAM块数,显著降低TCAM能耗。实验结果表明：与DFA相比,基于字符索引的DFA（CIDFA, character-indexed DFA）在能耗上平均减少了92.7%,在存储空间开销上平均减少了32.0%,在吞吐量上平均提高了57.9%。相似文献

7.

Efficient Multimatch Packet Classification for Network Security Applications

《Selected Areas in Communications, IEEE Journal on》2006,24(10):1805-1816

New network applications like intrusion detection systems and packet-level accounting require multimatch packet classification, where all matching filters need to be reported. Ternary content addressable memories (TCAMs) have been adopted to solve the multimatch classification problem due to their ability to perform fast parallel matching. However, TCAMs are expensive and consume large amounts of power. None of the previously published multimatch classification schemes are both memory and power efficient. In this paper, we develop a novel scheme that meets both requirements by using a new set splitting algorithm (SSA). The main idea behind SSA is that it splits filters into multiple groups and performs separate TCAM lookups into these groups. It guarantees the removal of at least 1/2 the intersections when a filter set is split into two sets, thus resulting in low TCAM memory usage. SSA also accesses filters in the TCAM only once per packet, leading to low-power consumption. We compare SSA with two best known schemes: multimatch using discriminators (MUD) (Lakshminarayanan and Rangarajan, 2005) and geometric intersection-based solutions (Yu and Katz, 2004). Simulation results based on the SNORT filter sets show that SSA uses approximately the same amount of TCAM memory as MUD, but yields a 75%–95% reduction in power consumption. Compared with geometric intersection-based solutions, SSA uses 90% less TCAM memory and power at the cost of one additional TCAM lookup per packet. We also show that SSA can be combined with SRAM/TCAM hybrid approaches to further reduce energy consumption. 相似文献

8.

Partitioning and gating technique for low-power multiplication in video processing applications

Hau T. Ngo Vijayan K. Asari 《Microelectronics Journal》2009,40(11):1582-483

In this paper, we propose a partitioning and gating technique for the design of a high performance and low-power multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce dynamic power consumption in the 2D convolution operations up to 33%. 相似文献

9.

A TCAM-based distributed parallel IP lookup scheme and performance analysis

Kai Zheng Chengchen Hu Hongbin Lu Bin Liu 《Networking, IEEE/ACM Transactions on》2006,14(4):863-875

Using ternary content addressable memory (TCAM) for high-speed IP address lookup has been gaining popularity due to its deterministic high performance. However, restricted by the slow improvement of memory accessing speed, the route lookup engines for next-generation terabit routers demand exploiting parallelism among multiple TCAM chips. Traditional parallel methods always incur excessive redundancy and high power consumption. We propose in this paper an original TCAM-based IP lookup scheme that achieves both ultra-high lookup throughput and optimal utilization of the memory while being power-efficient. In our multi-chip scheme, we devise a load-balanced TCAM table construction algorithm together with an adaptive load balancing mechanism. The power efficiency is well controlled by decreasing the number of TCAM entries triggered in each lookup operation. Using four 133 MHz TCAM chips and given 25% more TCAM entries than the original route table, the proposed scheme achieves a lookup throughput of up to 533 MPPS while remains simple for ASIC implementation. 相似文献

10.

Design and analysis of low-power cache using two-level filter scheme

Yen-Jen Chang Shanq-Jang Ruan Feipei Lai 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(4):568-580

Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power, it is one of the most attractive targets for power reduction. This paper presents a two-level filter scheme, which consists of the L1 and L2 filters, to reduce the power consumption of the on-chip cache. The main idea of the proposed scheme is motivated by the substantial unnecessary activities in conventional cache architecture. We use a single block buffer as the L1 filter to eliminate the unnecessary cache accesses. In the L2 filter, we then propose a new sentry-tag architecture to further filter out the unnecessary way activities in case of the L1 filter miss. We use SimpleScalar to simulate the SPEC2000 benchmarks and perform the HSPICE simulations to evaluate the proposed architecture. Experimental results show that the two-level filter scheme can effectively reduce the cache power consumption by eliminating most unnecessary cache activities, while the compromise of system performance is negligible. Compared to a conventional instruction cache (32 kB, two-way) implemented with only the L1 filter, the use of a two-level filter can result in roughly 30% reduction in total cache power consumption. Similarly, compared to a conventional data cache (32 kB, four-way) implemented with only the L1 filter, the total cache power reduction is approximately 46%. 相似文献

11.

A New Scan Partition Scheme for Low‐Power Embedded Systems

Hong‐Sik Kim Cheong‐Ghil Kim Sungho Kang 《ETRI Journal》2008,30(3):412-420

A new scan partition architecture to reduce both the average and peak power dissipation during scan testing is proposed for low‐power embedded systems. In scan‐based testing, due to the extremely high switching activity during the scan shift operation, the power consumption increases considerably. In addition, the reduced correlation between consecutive test patterns may increase the power consumed during the capture cycle. In the proposed architecture, only a subset of scan cells is loaded with test stimulus and captured with test responses by freezing the remaining scan cells according to the spectrum of unspecified bits in the test cubes. To optimize the proposed process, a novel graph‐based heuristic to partition the scan chain into several segments and a technique to increase the number of don't cares in the given test set have been developed. Experimental results on large ISCAS89 benchmark circuits show that the proposed technique, compared to the traditional full scan scheme, can reduce both the average switching activities and the average peak switching activities by 92.37% and 41.21%, respectively. 相似文献

12.

A Low Power Pseudo-Random BIST Technique

Nadir Z. Basturkmen Sudhakar M. Reddy Irith Pomeranz 《Journal of Electronic Testing》2003,19(6):637-644

Peak power consumption during testing is an important concern. For scan designs, a high level of switching activity is created in the circuit during scan shifts, which increases power consumption considerably. In this paper we propose a pseudo-random BIST scheme for scan designs, which reduces the peak power consumption as well as the average power consumption as measured by the switching activity in the circuit. The method reduces the switching activity in the scan chains and the activity in the circuit under test by limiting the scan shifts to a portion of the scan chain structure using scan chain disable. Experimental results on various benchmark circuits demonstrate that the technique reduces the switching activity caused by scan shifts. 相似文献

13.

An activity-driven encoding scheme for power optimization inmicroprogrammed control unit

Wang C.-Y. Roy K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):130-134

With the high demand for reliability and further integration, power consumption has become a critical concern in today's very large scale integration design. Considering the different techniques to minimize power consumption and promote system's reliability, reducing switching activity of CMOS circuits is a promising area to be explored. In this paper, we present a encoding scheme to refine the control memory in a microprogrammed control unit, which can reduce switching activities within the control unit and on the path from control unit to data-processing unit. To achieve this, pseudo-Boolean programming techniques have been introduced to efficiently encode don't care bits in the control memory. Experiments have been conducted with a subset of 8086 instruction set. Results show that, 4.8%-16.5% reduction of switching activities can be obtained from the proposed encoding scheme 相似文献

14.

Low-Power Ternary Content-Addressable Memory Design Using a Segmented Match Line

Baeg S. 《IEEE transactions on circuits and systems. I, Regular papers》2008,55(6):1485-1494

Power consumption in match lines is the most critical issue for low-power ternary content-addressable memory (TCAM) designs. In the proposed match-line architecture, the match line in each TCAM word is partitioned into four segments and is selectively pre-charged to reduce the match-line power consumption. The partially charged match lines are evaluated to determine the final comparison result by sharing the charges deposited in various parts of the partitioned segments. This arrangement reduces the match-line power consumption by reducing effective capacitor loading and voltage swing at match lines. The segmented architecture also enhances operational speed by evaluating multiple segments in parallel and by overlapping the pre-charging and evaluation stages. 512 $times$ 72 TCAM is designed using 0.18-$mu{hbox {m}}$ CMOS technology. The extracted RC values are used to show the power reduction benefits. The sample design demonstrated that the match-line power consumption using a segmented match line was conservatively 44% of that produced by traditional parallel TCAM. The power savings by segmenting match lines can be up to 41% over a low-voltage swing technique due to the independent discharge capability in segmented match-line architecture. 相似文献

15.

Variation aware intuitive clock gating to mitigate on-chip power supply noise

Alak Majumder Pritam Bhattacharjee 《International Journal of Electronics》2018,105(9):1487-1500

With the advent of semiconductor process technology, both the dynamic and static power consumption have become major concerns for the circuit designers. Though clock gating (CG) is a potentially accomplished technique to minimise the dynamic power, it generally fails to cut down the static power dissipation. To address the same, we have unveiled a new CG scheme incorporating leakage control transistor, which simultaneously curbs the static and dynamic power along with the alleviation of power supply noise (PSN) in silicon chips by smartly controlling the current ramp (di/dt) and average current i(t): the main contributors to PSN. The proposed CG does not only save average, dynamic and static power by 84.34%, 90.33% and 66.73%, respectively, but also reduces PSN by 84.44% with respect to its non-gated counterpart when simulated using Cadence Virtuoso® for 90 nm Generic Process Design Kit at a switching frequency of 5 GHz and power supply voltage of 1.1 V. 相似文献

16.

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis 总被引：1，自引：0，他引：1

Banerjee N. Raychowdhury A. Roy K. Bhunia S. Mahmoodi H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(9):1034-1039

Power consumption in datapath modules due to redundant switching is an important design concern for high-performance applications. Operand isolation schemes that reduce this redundant switching incur considerable overhead in terms of delay, power, and area. This paper presents novel operand isolation techniques based on supply gating that reduce overheads associated with isolating circuitry. The proposed schemes also target leakage minimization and additional operand isolation at the internal logic of datapath to further reduce power consumption. We integrate the proposed techniques and power/delay models to develop a synthesis flow for low-power datapath synthesis. Simulation results show that the proposed operand isolation techniques achieve at least 40% reduction in power consumption compared to original circuit with minimal area overhead (5%) and delay penalty (0.15%) 相似文献

17.

A 0.7-fJ/bit/search 2.2-ns search time hybrid-type TCAM architecture

Sungdae Choi Sohn K. Hoi-Jun Yoo 《Solid-State Circuits, IEEE Journal of》2005,40(1):254-260

This paper presents a hybrid-type TCAM architecture which can utilize the benefits of both NOR and NAND-type TCAM cells: high speed and low power. A hidden bank selection scheme is proposed to activate limited amount of cells during the search operation avoiding additional timing penalty. Match fine repeaters and sub-match fine scheme are used for fast NAND search operation. A test chip with 144-kb TCAM capacity is implemented using 0.1-/spl mu/m 1.2-V CMOS process to verify the proposed schemes. It shows 2.2 ns of match evaluation time on a 144-bit data search with 0.7 fJ/bit/search energy efficiency. 相似文献

18.

An Effective Power Reduction Methodology for Deterministic BIST Using Auxiliary LFSR

Myung-Hoon Yang Yongjoon Kim Sunghoon Chun Sungho Kang 《Journal of Electronic Testing》2008,24(6):591-595

Power consumption for test vectors is a major problem in SOC testing using BIST. A new low power testing methodology to reduce the peak power and average power associated with scan-based designs in the deterministic BIST is proposed. This new method utilizes an auxiliary LFSR to reduce the amount of the switching activity in the deterministic BIST. Excessive transition detector (ETD) monitors the number of transitions in the test pattern generated by LFSR and the low transition pattern is generated for excessive transition region using an auxiliary LFSR. Experimental results for the larger ISCAS 89 benchmarks show that reduced peak power and average power can indeed be achieved with little hardware overhead compared to previous schemes. 相似文献

19.

Hybrid-Type CAM Design for Both Power and Performance Efficiency

Yen-Jen Chang Yuan-Hong Liao 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(8):965-974

Content-addressable memory (CAM) is a hardware table that can compare the search data with all the stored data in parallel. Due to the parallel comparison feature where a large amount of transistors are active on each lookup, however, the power consumption of CAM is usually considerable. This paper presents a hybrid-type CAM design which aims to combine the performance advantage of the NOR-type CAM with the power efficiency of the NAND-type CAM. In our design, a CAM word is divided into two segments, and then all the CAM cells are decoupled from the match line. By minimizing both the match line capacitances and switching activities, our design can largely reduce the power consumption of CAM. The experimental results show that the hybrid-type CAM can reduce the search energy consumption by roughly 89% compared to the traditional NOR-type CAM. Because the hybrid-type CAM provides a fast pull-down path to speed up the lightweight match line discharge, the search performance of our design is even better than that of the traditional NOR-type CAM. 相似文献

20.

Efficient IP lookup using hybrid trie-based partitioning of TCAM-based open flow switches

S. Veeramani Sk. Noor Mahammad 《Photonic Network Communications》2014,28(2):135-145

IP forwarding technique in open flow switch can be done by comparing the destination IP address, which is stored in forwarding table with the input IP prefix. Ternary content-addressable memory (TCAM) is one of the popular mechanisms to store and forward IP packet where flow entries are organized in sorted manner. Searching a prefix value in TCAM uses longest prefix match rather than exact match technique. The major drawback of TCAM is high power consumption (12–15 Watts per chip) due to increase in lookup time. The objective of this paper was to reduce the search time of a key, which is stored in the forwarding table. This paper also proposes an efficient way to represent data and to reduce the index TCAM size by using $y$ -fast trie-partitioning algorithm, and it will take search time complexity of $O(loglog~n)$ . 相似文献